Category Archives: 12621

Cooking With CUBEs



As we should all know by now, PowerPivot provides Excel with a powerful way to harness data from one or more sources, and to do further analysis on that data within familiar pivot tables.


Furthermore, because PowerPivot is creating an in-memory cube of the data, it is possible to build an analysis using CUBE formulae. I have blogged a couple of times about CUBE formulae, in Cycling Through The Fog and in Cracking The Code.


In Excel, as with any development, you want your solution to be as flexible and dynamic as possible. This blog is about building dynamic tables using CUBE formulae, but to start with the following formula shows an example of a value extracted from a PowerPivot model using CUBE functions


=CUBEVALUE(“PowerPivot Data”,
CUBEMEMBER(“PowerPivot Data”,”[Measures].[Sum of SalesAmount]“),
CUBEMEMBER(“PowerPivot Data”, “[DimProductCategory].[EnglishProductCategoryName].&[Bikes]“),
CUBEMEMBER(“PowerPivot Data”,”[DimDate].[FiscalYear].&[2006]“))


Equation 1


This formula gets the Sales Amount from the PowerPivot cube for the Bikes product category, for the fiscal year 2006. There will be many values at this intersection, there can be many dates in 2006 and many products within that category, all pre-aggregated in the cube; the CUBEVALUE function returns that aggregate amount


We could build the whole table of values using similar formulae. In our table we need to know what the value is related to, so we have row and column headers that identify the intersection points. We could define those headers using the CUBEMEMBER functions giving a table such as shown in Figure 1 below, which shows a table based on AdventureWorks.



Figure 1 – Table of values over year and product category


The formulas for the headings are


=CUBEMEMBER(“PowerPivot Data”,”[DimDate].[FiscalYear].&[2006]“)
=CUBEMEMBER(“PowerPivot Data”,”[DimDate].[FiscalYear].&[2007]“)
etc. for the column headings,


Equation 2


and


=CUBEMEMBER(“PowerPivot Data”, “[DimProductCategory].[EnglishProductCategoryname].&[Accessories]“)
=CUBEMEMBER(“PowerPivot Data”, “[DimProductCategory].[EnglishProductCategoryname].&[Bikes]“)

etc. for the row headings.


The values at the intersection points simply use these heading cells like so


=CUBEVALUE(“PowerPivot Data”,”[Measures].[Sum of SalesAmount]“,$A3,B$1)


This is equivalent to the formula given in Equation 1.


Slicing the Vegetables


Further richness is bestowed upon us because we can also link slicers to our table, giving us the sort of filtering we have with the pivot tables. For example, Figure 2 shows the same data table built using CUBE formulae with a fiscal year slicer; the data reflecting the fact that only the years 2006, 2007, and 2008 have been selected.



Figure 2 – Table of values reflecting years slicer selections


Showing the slicer selections on your report has been covered elsewhere, but it is so useful and asked so often that I thought I would also cover. I also have a couple of variations that I haven’t seen elsewhere, which are worth presenting.


The Menu


Previously, as shown in the formulae in Equation 2, we built the row and column headers using hard-coded values for the year and category fields. We need to be more dynamic in how we list these values. To show the slicer selections as in E5, F5, etc., we need a list of values from which we can choose and display the individual ordered items. The CUBESET function gives us this. The syntax for CUBESET is


CUBESET(connection. set_expression, , [sort_order], [sort_by])


where connection is the cube, set_expression is the set of values required, and caption is a value to display. So, looking at cell D1 we have the formula


=CUBESET(“PowerPivot Data”,Slicer_FiscalYear,”Set of Years


Equation 3


which would look as shown in Figure 3 when added to cell D1 to build our set of fiscal years.



Figure 3 – Slicer years set formula


As can be seen, we use Slicer_FiscalYear as the set_expression, so the set will include all selected values in that slicer, with the caption signifying the cell contents.


So far, so good, but we still need to list those selected values. For this, we use the CUBERANKEDMEMBER function, which returns the nth, or ranked, member in a set. The syntax of this


=CUBERANKEDMEMBER(connection, set_expression, rank, )


where connection is the cube as before, set_expression is the set of values to choose from, and rank is nth value. So, to get the first member, we use


=CUBERANKEDMEMBER(“PowerPivot Data”,$D$1,1)


for the second


=CUBERANKEDMEMBER(“PowerPivot Data”,$D$1,2)


and so on.


Because there are 5 years in the PowerPivot model, and when filtered in the slicer we might be showing less than 5, we need to cater for a variable number of items. The simplest way is just to add an error wrapper around the formula,


=IFERROR(CUBERANKEDMEMBER(“PowerPivot Data”,$D$1,1),””)


=IFERROR(CUBERANKEDMEMBER(“PowerPivot Data”,$D$1,2),””)


etc.


Equation 4


Why Extra Ingredients?


We could nest the CUBESET function within the CUBERANKEDMEMBER,but that would mean that a set is evaluated 5 times. By defining the set in its own cell and referring to that cell within the CUBERANKEDMEMBER function, it is evaluated just the once. A small matter, but it makes the spreadsheet easier to maintain, and is more efficient.


Cooked To Perfection


It’s as simple as that.


But hang on a minute, have we overcooked it?


Looking at the syntax definition for these two functions, we can see that they both take set_expression as an argument. The CUBESET function is passed the slicer values as its set, and in turned is passed to the CUBERANKEDMEMBER function as its set.


As the slicer values is a set_expression, you would think that we should be able to pass the slicer values directly to CUBERANKEDMEMBER as a set and be done with. And so we can, these formulae


=IFERROR(CUBERANKEDMEMBER(“PowerPivot Data”,Slicer_FiscalYear,1),””)


=IFERROR(CUBERANKEDMEMBER(“PowerPivot Data”,Slicer_FiscalYear,2),””)


etc.


Equation 5


work equally as well as those formulae in Equation 4


Managing The Ingredients


The list of product categories can also be listed in a similar way using CUBESET and CUBERANKEDMEMBER. Here we do need CUBESET as there is no pre-defined set of values as we had with the fiscal year that we can pick up. The set will be all values for the Product Category English name in the Product Category table,


=CUBESET(“PowerPivot Data”,
“[DimProductCategory].[EnglishProductCategoryName].Children”,
“Set of Categories”)


Equation 6


As can be seen, .Children gets us all of the category values.


One thing to note is the use of the caption argument. Again, this helps to highlight the cell containing the set.


We now have formulae that can define our full table, such as


D1: the formula in Equation 3
=CUBESET(“PowerPivot Data”,Slicer_FiscalYear,”Set of Years”)


D2: the formula in Equation 6
=CUBESET(“PowerPivot Data”,
“[DimProductCategory].[EnglishProductCategoryName].Children”,
“Set of Categories”)


E5:I5: the formulae in Equation 5
=IFERROR(CUBERANKEDMEMBER(“PowerPivot Data”,Slicer_FiscalYear,1),””)
=IFERROR(CUBERANKEDMEMBER(“PowerPivot Data”,Slicer_FiscalYear,2),””)

etc.,


D6:D9: formulae for the product categories
=IFERROR(CUBERANKEDMEMBER(“PowerPivot Data”,$D$2,1),””)
=IFERROR(CUBERANKEDMEMBER(“PowerPivot Data”,$D$2,2),””)
=IFERROR(CUBERANKEDMEMBER(“PowerPivot Data”,$D$2,1),””)
=IFERROR(CUBERANKEDMEMBER(“PowerPivot Data”,$D$2,2),””)


And finally, in E6:I9, the formulae for the values
=IFERROR(CUBEVALUE(“PowerPivot Data”,”[Measures].[Sum of SalesAmount]“,$D6,E$5),””)
etc., each cell reflecting the correct product category ($D6) and fiscal year (E$5).


Our table now has a full set of values, and reflects the choices made in the fiscal year slicer.


(If we wished, we could add the product categories to a slicer, and make our table dynamically reflect that.)


Ready to Serve?


Although we have been diligent in storing the evaluated sets in one place rather than nest a CUBESET function within the CUBERANKEDMEMBER function, but there are still a number of things going on here that I just don’t like:


  • The connection is hard-coded, multiple times
  • If a new year is added to the data, just copying cell I5 to J5 won’t work because the rank is hard-coded in the formula, it will need a small change
  • If no selection is made in the fiscal year slicer, the values shown are the total of all years, with a header value of All, as shown in Figure 4. This may be what is required in some instances, in others we may want to show each year’s values individually.


Figure 4 – Showing all years as a total with no slicer selections


These ‘difficulties’ can be overcome relatively easily.


Rather than hard-code the connection within each formula, put the connection text ‘PowerPivot data’ (without the quotes), in a cell, say D3, and assign an Excel name _cube, then use that defined name within the formulae. Note that this also makes the transition to Excel 2013 simpler, where the connection has now changed to ‘ThisWorkbookDataModel’ (again, without the quotes).


The rank is managed by using a function that returns a variable number depending upon the row or column of the cell, namely ROW([reference]) or COLUMN([reference]). You might think that you can use COLUMN(A1) in E5, and copy that across so that it updates to COLUMN(B1), COLUMN(C1), etc. Believe me, this is a very bad idea. Although everything will be fine at first, what happens if you decide to insert a column before column E? The answer is that COLUMN(A1) will update to COLUMN(B1) and whereas the first column of year table originally reflected the first selected year in the fiscal year slicer, it will now reflect the second. You might say that you would never do that, but no-one ever does until they do. For the sake of a simple change it is hardly worth risking it.


The suggested change is to use COLUMN()-COLUMN($D$5), which uses the top left cell of our table as an anchor point. Thus, a formula in cell E5 using these functions will return 1 for that calculation, and so on. If a column is inserted to the left of the table, those parts of formulae will update to COLUMN()-COLUMN($E$5), which means the formula that was in cell E5 which has now moved to cell F5 still return 1 for that calculation.


Similarly, the category list will use ROW()-ROW($D$5).


Finally, how can we show each year in the column headers and the vales for those years when no slicer selection is made, rather than showing ‘All’ and totals for all years? We already have the formula in D1 that gets the set of selected slicer years, that is


=CUBESET(_cube,Slicer_FiscalYear,”Slicer Years”)


As we showed before, we can get a set of all year regardless of slicer selection with the CUBESET function and the member’s children property, as in


CUBESET(_cube,”[DimDate].[FiscalYear].Children”,”Dimension Years”)


But how do we know when to use which? One way would be to test whether the first member of this slicer set returns All. If it does, there are no slicer selections so we show all years individually, if not we show the slicer selected years. We can check the first slicer set value with the following


IF(CUBERANKEDMEMBER(_cube,Slicer_FiscalYear,1)<>”All”, …


Adding all three elements together, we have the following formula in D1 that determines what goes into the set of years that will drive the table column headings


=IF(CUBERANKEDMEMBER(_cube,Slicer_FiscalYear,1)<>”All”,
CUBESET(_cube,Slicer_FiscalYear,”Slicer Years”),
CUBESET(_cube,”[DimDate].[FiscalYear].Children”,”Dimension Years”))


With this formula to get the years set, we can see all the years listed when no selections are made on the ribbon rather than showing all year totals, as in Figure 5.



Figure 5 – Showing all years with no slicer selections


What’s For Dessert?


That’s about it. Using this technique we have a table that shows the value by year by product category, with a slicer for selecting specific years which is reflected in the years shown in the table. The years and product categories are dynamically built and so can accommodate extra years and extra categories in the source data, and the years can also handle a full slicer set without showing the values as totals for all years.

Cutting Edge BI



I went to the SQL Server 2008 R2 launch in London on 15th April 2010 (a while ago I know, but I have been away for a week :-)). I am not a SQL Server DBA or anything such, I went specifically for the BI sessions, PowerPivot and Reporting Services.


I have to say that I enjoyed the day enormously, much more so than the Visual Studio 2010 launch the previous Monday (the Visual Studio day was informative, but a bit dry in comparison), but the reason for this post is to talk about the case study that was presented on that day.


In the opening session, the case study was given by Yasmeen Ahmad and Dr Mark Whitehorn of Dundee University. Mark is in the university’s school of computing; Yasmeen was previously one of Mark’s undergraduates, but is now doing her PhD as part of the Wellcome Trust Centre for Gene Regulation & Expression. Mark is a big advocate of BI, it is listed as his resaerch theme in his university profile, and Microsoft’s stack approach, and Yasmeen has obviously adopted the same enthusiasm, and is working in the Wellcome Trust Centre utilising her BI development skills (no doubt, learning much about proteomics along the way).


The case study was based upon Yasmeen’s development of a BI tool for the analysis and management of proteomics data. I spoke to Yasmeen in the break about her work, and was enthused enough to want to learn more.


With details provided by Yasmeen, I found the this publication which details this work, the creation of a ‘Protein Frequency Library’; a data environment (PepTracker); and an OLAP cube to provide the researchers with the ability to analyse their data. The paper is naturally heavy on the science, and I whilst I was soon lost in the scientific details (I have asked my daughter to explain it to me :-)), I was able to understand enough contextually to enjoy the BI parts, the development of the model and the analysis.


I just found it illuminating to see BI applied to cutting edge science in this way, not an obvious area.


Any errors in the above are undoubtedly mine, not due to Mark or Yasmeen.

Cycling Through The Fog

Dynamic Analysis

Excel 2007 provides the capability to have a pivot table connected to an OLAP cube. Furthermore, detailed analysis can be built by using various CUBE functions. One of the great things about Excel 2007 pivots and CUBE functions is that you can use a value from the pivot as a member value within a CUBE function. For instance, a CUBEVALUE might be getting the net gas sales for a particular company and period with

CUBEVALUE(CubeName,

                       “[Transaction Company].[Company Drilldown].&[17]”,

                       “[Activity Date].[Date Drilldown].[Month].&[1]&[1]&[2009]“),

                       “[Measures].[Gas_Amt_Net]”)

If you also have a pivot that is filtering the company drilldown, the current selection can be substituted in that formula and achieve the same results,

CUBEVALUE(CubeName,

                       ‘Pivot Sheet’!B10,

                       “[Activity Date].[Date Drilldown].[Month].&[1]&[1]&[2009]“),

                       “[Measures].[Gas_Amt_Net]”)

Obviously, this gives us a dynamic capability, our analysis can reflect the currently selected pivot value(s).

Next/Previous Function

I have a spreadsheet that has built a comprehensive analysis using this technique.  This all works well, but I was looking for another useful function. I needed to be able to provide Next/Previous buttons that would enable cycling the analysis through each of the top-level members of the Company Drilldown without the user having to go back to the pivot and reselect the Next/Previous item. The Company Drilldown was a PageField in my pivot.

As far as I am aware, there is no built-in way to get the next or previous item in a pivot filter, this would have been the most obvious and simplest way, but nt to be.

My next thought was that if I could somehow obtain the MDX for the each item, I could create a CUBEMEMBER function for that item, and point all of my CUBEVALUE functions at that cube member. But how to get the MDX?

My first thought was to query the cube directly using the drilldown dimension, but that would get the values, but not the MDX.

The MDX must be somewhere within the pivot, otherwise I cannot see how a reference to the pivot field works when embedded in a CUBE function, but with all of the delving into the properties of the pivot object, I could not see where.

Retrieveing PageField members

I did figure out how to get details of all members of the PageField by iterating through the cube as in the example below.

 

Private Function IterateCubeToGetMembers()
Dim mpConn As ADODB.Connection
Dim mpCat As ADOMD.Catalog
Dim mpDim As ADOMD.Dimension
Dim mpHier As ADOMD.Hierarchy
Dim mpLevel As ADOMD.Level
Dim mpMember As ADOMD.Member
Dim mpData As Range
Dim mpCell As Range
Dim i As Long, j As Long

    Set mpData = wsApp.Range(NAME_REF_LIST_HIERARCHY)
    Set mpConn = New Connection
    mpConn.Open “Provider=MSOLAP;” & _
                “Initial Catalog=<;” OLAP_SERVER_NAME & _
                “Data Source”, OLAP_DB_NAME)
    Set mpCat = New ADOMD.Catalog

    With mpCat.CubeDefs(OLAP_CUBE_NAME)

        ReDim appVecHierarchies(1 To mpData.Rows.Count)
        i = 0

        For Each mpCell In mpData.Columns(1).Cells

            Set mpDim = .Dimensions(“[Company Drilldown]“)
            Set mpHier = mpDim.Hierarchies(mpCell.Offset(0, 1).Value)
            Set mpLevel = mpHier.Levels(mpCell.Offset(0, 2).Value)
            ReDim appVecMembers(1 To mpLevel.Members.Count)
            j = 0

            For Each mpMember In mpLevel.Members

                j = j + 1
                appVecMembers(j) = mpMember.Name
            Next mpMember

            i = i + 1
            appVecHierarchies(i) = appVecMembers
        Next mpCell
    End With

    Set mpDim = Nothing
    Set mpHier = Nothing
    Set mpLevel = Nothing
    Set mpMember = Nothing
    Set mpCat = Nothing
    Set mpConn = Nothing
    Set mpData = Nothing
End Function

 

Note, mpData.Columns(1).Cells is a list of different dimensions that I load into the PageField depending upon user selection.

This is a nice technique, but still only gets me the member values, it does not get the associated MDX.

Getting The MDX

Somewhere along my investigations, I recalled seeing that the pivotitem was showing the MDX in the SourceName property.

Getting the PivotItems on a PageField was problematical, so I couldn’t use that directly. But what if the drilldown dimension was not a PageField, but was in the rows area for example?

Obviously, I am not at liberty to change the original pivot, but I could create another one, load the drilldown dimension into the rows area, then loop through the PivotItems in PivotField(1) (we set it up with just the single pivot field). These PivotItems have the MDX in the SourceName property.

Final Solution

The following function builds arrays of the member and MDX values. It does it by

-          adding
a sheet

-          setting
up the pivot table

-          loading
the dimensions one by one into the Rows area

-          retrieving
all of the PivotItem values.


 


Private function LoadHierarchies()
CONST PIVOT_TEMP as string = “pvtTemp”
Dim appVecHierarchies As Variant
Dim appVecMembers As Variant
Dim mpSheet As Worksheet
Dim mpData As Range
Dim mpCell As Range
Dim mpPivotItem As PivotItem
Dim i As Long, j As Long

    Set mpSheet = Worksheets.Add(After:=Worksheets(Worksheets.Count))
    ThisWorkbook.PivotCaches _
        .Create(SourceType:=xlExternal, _
                SourceData:=ThisWorkbook.Connections(CONNECTION), _
                Version:=xlPivotTableVersion12) _
        .CreatePivotTable TableDestination:=mpSheet.Name & “!R5C4″, _
                          TableName:=PIVOT_TEMP, _
                          DefaultVersion:=xlPivotTableVersion12

    Set mpData = wsApp.Range(NAME_REF_LIST_HIERARCHY).Columns(1)
    ReDim appVecHierarchies(1 To mpData.Cells.Count)
    For Each mpCell In mpData.Cells
   
        With mpSheet.PivotTables(PIVOT_TEMP)
          
            ‘store the next value in the pivot rows
            With .CubeFields(mpCell.Value)
                .Orientation = xlRowField
                .Position = 1
            End With

            With .PivotFields(1)

                j = 0
                ReDim appVecMembers(1 To .PivotItems.Count, 1 To 2)
                For Each mpPivotItem In .PivotItems

                    j = j + 1
                    appVecMembers(j, 1) = mpPivotItem.Caption
                    appVecMembers(j, 2) = mpPivotItem.SourceName
                Next mpPivotItem

                ‘we save the member details as an array within an array
                i = i + 1
                appVecHierarchies(i) = appVecMembers
            End With

            ‘remove the pivotfield ready for next
            .CubeFields(mpCell.Value).Orientation = xlHidden
        End With
    Next mpCell

    mpsheet.Delete

    Set mpPivotItem = Nothing
    Set mpSheet = Nothing
    Set mpCell = Nothing
    Set mpData = Nothing
End Function


appVecHierarchies and appVecMembers are global arrays for storing the member values and MDX.


One Oddity


On a slight tangent, I found one oddity in this technique.


When the company drilldown was loaded into Page area, all of the companies in the cube showed up. When loaded into the Rows area, only some of the companies showed. This was due to the fact that some of the companies had no associated data.


Summary


I have a solution, but it is slow to load, and I cannot say I am happy with it.


It seems to me that the MDX must be held somewhere in the pivot so that the link from cube functions work, but I cannot find it.


I would welcome any thoughts on this, a better technique, or anything else.