Category Archives: 14793

CUBEs For Dessert



Introduction


A few weeks ago I showed how to create a fully dynamic table using CUBE formulae against a PowerPivot model in Cooking With Cubes.This example presented data from the AdventureWorks database of sales of product categories over years, all integrated with a Fiscal Year slicer.


I recently had a similar need, but with the further challenge of presenting two levels in a hierarchy (actually, this was related tables in a snowflake schema). I needed to show all of the items in the first level, and for each of these items, all of the associated items at the second level including those items that have no sales value. The AdventureWorks database has a similar setup with products, where each sale is for a particular product, each product is of specific product sub-category, and each product sub-category is of a specific product category. Within this model, we would be showing sub-categories of Mountain Bikes and Road Bikes against the category of bikes, but not against any other category.


 


I will demonstrate my solution using the same AdventureWorks data. Whilst this will be similar to the previous post in showing how to manage the presentation using Excel formulae, it is extended to multiple data levels and some more complex formatting. Additionally, and the main point of this post, I show a technique to display the related values in that hierarchy, not simple with straight CUBE formulae, by adding some MDX trickery in the CUBE formulae.


Figure 1 shows the final results of all of the formulae and formatting.



Figure 1 – Final Results


Preparing With Good Ingredients


First I need to prepare the PowerPivot model so that I have have product name, product sub-category name, and product category name all on the Product table. To achieve this, I added two calculated columns


[ProductSubcategoryName]
=RELATED(ProductSubcategory[ProductSubcategoryName]
[ProductCategoryName]
=RELATED(ProductCategory[ProductCategoryName]
Formula 1 – PowerPivot Model Calculated Columns


Mixing It Up


The data is ready, so let’s build the required data. I need the following:


  1. a text definition of the cube
  2. a set of all categories
  3. a count if the items in the category set
  4. a list of all of the items in the year set, from the slicer or the dimension if no slicer selections
  5. a count if the items in the year set
  6. a list of all of the items in the category set (in columns)
  7. for each listed category, a set of all associated sub-category items
  8. for each sub-category set, a count of all items in that set
  9. a running total of the sub-category set counts
  10. for each sub-category set, a list of items (in rows)
  11. a reference cell for the sales measure

Taking these in the order given, showing cell(s), defined name, definition:


1. Cube Definition


This is simple text for the connection argument to the cube functions.
J1:[_model]PowerPivot Data
I am using Excel 2010 for this example, Excel 2013 would use the value ThisWorkbookdataModel.
In the previous example, I named this cell _cube, here I use _model.


2. Category Set


This is the set of all categories in the model. This is being taken from the calculated column for product category name in the Products table.
J2:[_setCategory]=CUBESET(_model,”[Product].[ProductCategoryName].Members”,”Categories”)


3. Category Set Count


A simple count of the items in the category set, used in later dynamic formulae.
J3:[_setCategoryCount]=CUBESETCOUNT(_setCategory)-2
I subtract two from the actual count because this set contains the All member and blank, I am not interested in these.


4. Year Set


This is the set of years in the model, constrained by the slicer.
J4:[_setYear]
=IF(CUBERANKEDMEMBER(_model,Slicer_FiscalYear,1)<>”All”,
CUBESET(_model,Slicer_FiscalYear,”Slicer Years”),
CUBESET(_model,”[Date].[FiscalYear].Children”,”Dimension Years”))

As in the previous example, to show all years with no slicer selection (not show All), I check the slicer to see the first set member is All, getting the set from the Date dimsnsion if so, otherwise get the set from the slicer.


6. Year Set Count


A simple count of the items in the year set.
J3:[_setYearCount]=CUBESETCOUNT(_setYear)


6. List of Category Set Items


A list of the items in the category set, for use in building the sub-category sets.
K2:N2:[_listCategory]=IFERROR(CUBERANKEDMEMBER(_model,$J$2,COLUMN()-COLUMN($J$2)+2),””)
The name definition uses a dynamic formula
=OFFSET(_setCategory,0,1,1,_setCategoryCount)
This is a simple dynamic OFFSET formula to build a list of the categories.


7. Sets of Subcategory Items


Sets of sub-category items, one set per category.
K3:N3:[_setsSubcategory]=IFERROR(CUBESET(_model,”EXISTS([Product].[ProductSubcategoryName].Members,”&K$2&”)”,”Products:”&K$2),””)
This is where the MDX magic kicks in. The EXISTS function … Returns the set of tuples of the first set specified that exist with one or more tuples of the second set specified …, so we can use it within a CUBESET function to get all of the product subcategory names for the category above, in _listCategory.
Using this technique, instead of building a set of all subcategories, we build a set of the subcategories for a particular category; in our case the category passed to the CUBESET function from the cells in _listCategory.
I prefix the text shown with the text Products: for readability
The name definition uses a dynamic formula
=OFFSET(_setCategory,1,1,1,_setCategoryCount)


8. List of Subcategory Set Counts


A list of counts for the sub-category sets, for use in the layout.
K4:N4:[_setsSubcategoryCounts]=INDEX(CUBESETCOUNT(_setsSubcategories),COLUMN()-COLUMN(_setCategory))
The name definition uses a dynamic formula
=OFFSET(_setCategory,2,1,1,_setCategoryCount)


9. List of Subcategory Set Count Totals


A list of running totals for the sub-category set counts, for use in the layout.
K5:O5:[_setsSubcategoryTotals]=SUM($K$4:K4)-K$4+1
The name definition uses a dynamic formula
=OFFSET(_setCategory,3,1,1,_setCategoryCount+1)


10. Lists of Sub-category Sets Items


Lists of each set sub-category items, within the relevant category.
K6:N30:[_listSubcategoryies]=IFERROR(CUBERANKEDMEMBER(_model,K$3,ROW()-ROW(_setCategory)-2),””)
This is using the same technique to list the set items as in the previous blog.
The name definition uses a dynamic formula
=OFFSET(_setCategory,4,1,MAX(_setsSubcategoryCounts)-1,_setCategoryCount))
Unlike the other dynamic formulae, this one is multi-column, for the multiple categories.


11. Sales Measure


Finally, we have a cell defining the Sales measure
J7:[_sales]=CUBEMEMBER(_model,”[Measures].[Sum of SalesAmount]”,”Sales”)
This is to save repeated definition within the value formulae.


Figure 2 and Figure 3 below show all of the helper cells populated, and the nanmes defined in the NameManager.



Figure 2 – Populated Helper Cells




Figure 3 – Defined Names


It’s All About The Presentation


All that needs to be done now is to lay it out, allowing for a variable number of categories and a differing variable number of sub-categories within each category.


In my layout, I am repeating each category a number of times equal to to the number of subcategores for that category, but only showing the first instance. The years are shown as column headings, using the year slicer to filter the columns. The values are displayed referencing the category, sub-category, and year headings.


The formulae are as follows:


Category items


A7:An:=IFERROR(INDEX(_listCategory,MATCH(ROW()-ROW(_start),_setsSubcategoryTotals,1)),””)
This matches the ordinal row number within the list of categories against the running totals for sub-category set counts, so as to determine which of the categories we are listing.

 


Sub-category items


B7:Bn:=IFERROR(IF(A7<>A6,””,INDEX(_listSubcategories,COUNTIF($A$7:A7,A7)-1,MATCH(ROW()-ROW(_start),_setsSubcategoryTotals,1))),””)
This uses the same matching technique to determine which sub-category set to list.

 


Filtered Years


D6:H6:=IFERROR(CUBERANKEDMEMBER(_model,_setYear,COLUMN()-COLUMN(_start)-2),””)
Although the slicer controls the selected years, the year set in _setYear outsorts the data, the filtering is done there.

 


Sales Totals


C7:C50:=IFERROR(CUBEVALUE(_model,_sales,IF($A7<>$A6,$A7,$A7:$B7),Slicer_FiscalYear),””)
This determines the sum of the values for the current row, be that just a category row, or a category and sub-category row, whilst respecting the slicer selection. The IF in the formula caters for that possibility of a category row or a category and sub-category row.

 


Sales Values


D7:H50:=IFERROR(CUBEVALUE(_model,_sales,D$6,IF($A7<>$A6,$A7,$A7:$B7)),””)
Similar to the sales totals, but using the year heading instead of the slicer, beacuse the year heading has already take the slicewr into account.

 


Conditional formatting is added to the results to improve the viusual effect and readability.


Product Category Labels


The list of product categories is repeated on every row, changing at the appropriate point of course. This value is needed for the value calculation, but I feel the look is improved if the repeated items are suppressed, so I use conditional formatting to set the font colour to white on repeated items. This uses a simple formula of

A7:A50:=A7=A6


Product Category Totals


I list the product category one more time that its associated sub-categories. This is so that I can show category totals as well as category and sub-category totals. These need highlighting so that the totals stand out. To achieve this, I have made them bold with a blue font, borders above and below. The conditional formatting formula for this is

A7:H50:=AND($A7<>””,$A7<>$A6,ROW($A7)-ROW(_start)-1<=MAX(_setsSubcategoryTotals),COLUMN(A6)-COLUMN(_start)-3<=_setYearCount)


But Is It All Too Rich?


One question that may occur to anyone reading this who is skilled with pivot tables is, why bother? Couldn’t the same effect be achieved by formatting a pivot table showing the product category and sub-category and years with the format set to Show Tabular Form and Repeat All Item Labels? This would give a similar look, but it would only show product categories and sub-categories that have a value, it would not show those that are not selling. Agreed, there is a pivot option to Show items with no data on rows which does address this, but the problem here is that it will show all product sub-categories under EVERY product category, my method only shows the relevant sub-categories.


In Summary


In many ways this is similar to the previous post on CUBE formulae, with an extra level of data and more complex formatting. But what I hope that it shows is that by utilising some fairly straightforward MDX the formulae can do far more complex operations, and extend the data displayed. And who knows how far this can be taken as one’s MDX skills increase?


It is possible that many of the formulas could be replaced with DAX measures in the model, maybe that is a post for another time.


You can download the workbook here


Acknowledgements


Allan Folting of Microsoft first showed me this technique, I am grateful for his insights.


Chris Webb has a series of excellent blog posts on MDX For PowerPivot Users, which are worth reading if this technique interests you.

Cracking The Code



Excel’s ability to run pivot tables against an OLAP cube in Analysis Services is a truly wondrous beast. The ease of use; volumes of data; the filtering; all of these provide a rich environment for users to easily inspect their data and get real information out. Add to this that Microsoft seem to believe that pivot tables are a winner and keep improving it, access to cubes and slicers being some of the latest significant changes, I can see no reason for all serious Excel users not to get to grips with this technology.

In addition to all of this goodness, it is possible to build a table query against the cube using the cube formulae. Sometimes this might be the appropriate way to present your information. But cube formulae are hard, the syntax is pure MDX, and long-winded. For example, using the MDX cube from the SQL Server MDX Step By step book that I used in my last post, a cube formula to get the Reseller Sales Amounts for Cleaner products in 2003 within the US is

=CUBEVALUE(“MDX SBS SBS”,”[Date].[Calendar].[Calendar Year].[CY 2003]”,”[Product].[Product Categories].[Subcategory].[Cleaners]”,”[Measures].[Reseller Sales Amount]”,”[Geography].[Geography].[Country].&[United States]”)

Phew! That is hard work. Imagine repeating that for say years by product category, maybe a 5×20 table of values.

I tend to create simple macros that add lists of available members to a dropdown in the ribbon, a user can pick a business oriented item from this list, and it updates the activecell formula to add the appropriate MDX. This way, a user has a simple way to build the cube formulae, and will see the MDX so they can learn if they are so inclined.

In addition, we can be smarter and simplify the formulae, because each row would use the same product sub-category, each column would use the same calendar year. We can use the CUBEMEMBER function to get the row and column heading values, such as

=CUBEMEMBER(“MDX SBS SBS”,”[Date].[Calendar].[Calendar Year].[CY 2003]”)

and use that within the value formula,

=CUBEVALUE(“MDX SBS SBS”,I$5,”[Product].[Product Categories].[Subcategory].[Cleaners]”,”[Measures].[Reseller Sales Amount]”,”[Geography].[Geography].[Country].&[United States]”)

Similarly, we can use CUBEMEMBER to get the Cleaners member

=CUBEMEMBER(“MDX SBS SBS”,”[Product].[Product Categories].[Subcategory].[Cleaners]”)

and use that within the value formula,

=CUBEVALUE(“MDX SBS SBS”,I$5,$H6,”[Measures].[Reseller Sales Amount]”,”[Geography].[Geography].[Country].&[United States]”)

Finally, the US member

=CUBEMEMBER(“MDX SBS SBS”,”[Geography].[Geography].[Country].&[United States]”)

and use that within the value formula,

=CUBEVALUE(“MDX SBS SBS”,I$5,$H6,”[Measures].[Reseller Sales Amount]”,$H$3)

So far, so good, but how are we supposed to know what those members are, and how to get the full MDX syntax of the member items? Of course, if we know the cube structure intimately and know MDX syntax thoroughly, we have no problems, but most users are not as informed.

Yet again, Excel comes to our rescue, in that it will tell us the cube formulae.

If we take the pivot table that we showed in the last post again, a table of product sub-categories over calendar years for a particular region, Figure 1, which just happens to match up to the example above, we can demonstrate how



Figure 1


If we go to the PivotTable Tools context ribbon, and on the Options tab, Tools group there is an OLAP Tools dropdown. The final item on this dropdown is ‘Convert to Formulas’, Figure 2, which will change the pivot table to a table of cube formulae.



Figure 2


When you select this option, you will get the dialog shown in Figure 3 which allows you to keep any report filters or convert them.



Figure 3


In this example, I retained the report filters, and the result was a table as shown in Figure 4.



Figure 4


If we examine this table, we can see the cube formulae that generate the table, Figure 5 highlighted shows a particular cube member, and the MDX that is required. This could be the end result, we might just use the table as presented, or it can be used as a learning tool, to familiarise with the cube and the MDX syntax.



Figure 5


Whilst this is a very handy trick for getting to understand the MDX required for generating the cube formulae, it is on an individual cell basis. The pivot table itself does not pull all of this data back on a cell by cell basis, it issues a far more complex MDX set query that pulls back all of the data specified in your pivot table. To see this code would be very useful in gaining better MDX understanding, so how can we do this?

There is nothing built into Excel to provide this, but there is a free addin at CodePlex called OLAP PivotTable Extensions which adds a pivot table context menu item. One of the extensions is to show the MDX code. If you right click within the pivot table, you should see the new context menu item, as in Figure 6.



Figure 6


This displays the following dialog, Figure 7, where you can see the MDX code on the MDX tab.



Figure 7


As you can see at first sight, this is quite scary, but most of it is a list of cells being returned, such as [Product].[Product Categories].[Product].[List Price]. Cut and paste it to a text editor and you should be able to format it and make it more readable, helping to understand what it is doing.

One thing to note in this query, as in most pivot table queries off a cube, is that is uses the NON EMPTY clause in the selects. What this effectively does is to outsort any of the member combinations along a particular axis that have nothing but empty cells. At first sight this seems like a good idea, don’t clog up the query with a whole bunch of rows where there is no data. But there is a potential downside to this. What if you want to see some of those empty rows? As in the case where some of your sales team might have targets and some don’t (for instance, customer salesmen as against store salesmen). In this case, you would want to see their empty rows because you would want to know that they are not hitting those targets, but you would not want to see the empty rows for the store sales staff. In an MDX query in SMS you would use the Exists function instead of NON EMPTY, I have yet to find out how to do the same in a pivot table.

Does Excel Do MDX?




One of my primary interests at present is Business Intelligence (BI), using available tools to present meaningful information to the business that actually adds value.


I won’t go into what BI is, the quality of BI products available at present, or the how businesses perceive or fails to perceive BI, that may be a topic for another day, but rather in developing my skills and capabilities in this area. Excel is a wonderful medium for presenting this information, and the majority users that I know would rather use Excel than other tools.


Most people who are familiar with BI will have heard of OLAP cubes, a technology that allows manipulating and analyzing data from multiple perspectives, and provides fast analysis of data in a way that a relational database just cannot match. In Excel, it is very simple to build a pivot table that can plug into an OLAP cube, allowing retrieval of large, complex of data sets easily and quickly.


I have built many applications that utilise pivot tables against cubes, and have also started to build more analyses using the newly added cube functions in Excel. The basic format of a typical cube function, say CUBEVALUE, is


CUBEVALUE(connection, member_expression1,member_expression2…)


with an example formula being


=CUBEVALUE(“MDX Step-By-Step Chapter 3 Cube”,”[Product].[Category].[Bikes]”,”[Measures].[Reseller Sales Amount]”)


where a member_expression, such as “[Product].[Category].[Bikes]”,”[Measures].[Reseller Sales Amount]”)is a text string of a multidimensional expression (MDX) that evaluates to a member or tuple within the cube. This is not the most intuitive syntax I have come across. There are various ways to find these tuples, which we will not go into here but maybe a subject for a later post, but I felt it would be a good idea to get better acquainted with MDX. By getting a better understanding of MDX I fell I would be able to query cubes more effectively, from within SQL Server 2008 Management Studio (SMS), or even within Excel.


To this end, I bought myself SQL Server 2008 MDX Step by Step, a book from Microsoft Press, and the following is my view of this book (yes, I did pay for this book, it is not a free review copy :)).


One thing to explain here, especially for those of you who know this primarily as an Excel blog, is that this book is primarily aimed at developers working in Analysis Services and makes no mention of Excel that I can recall, and all the examples are shown within SMS.


Before I make any specific comments, I will state that I had used MDX prior to reading this book, but felt a bit at sea. It may that as this was my first book on the subject that it found me as a soft target, but I liked the book, and felt it was just about worth my outlay. The comments below should be taken with this overall view in mind.


The book starts gently enough, guiding you through an overview of BI and MDX, introducing the MDX Query Editor in SMS, and explaining the concept of tuple.


Chapter 3 was the first where any real MDX code was deployed, using a simplified cube in order to focus on the points being made. As well as tuples, it introduced attribute and user hierarchies, which are key cube fundamentals to understand.


Chapter 4 was the first where the material became more than just simple explanation and more detailed concepts were introduced. I think that in too many places the authors did a less than satisfactory job of explaining what was going was going on; too often things were stated without an explanation as to why, and the rationale for techniques was not explained. This is especially important in this chapter, the first where we meet these less obvious concepts and where that clear explanation is vital. For example,


- on p67 the Distinct function is introduced, and explains how it can be used to remove any duplicate rows (or columns) that the query might generate. The thought occurred to me that maybe you just add that to every set statement to make sure there are no duplicates, but of course I am sure it comes at a high cost in performance. Nothing was mentioned as to the pros and cons. Just stating what the function does should not be the objective of a book in my view, that is what help is or, it should look to guide you through when and where to use them, best practices, and so on


- when discussing MeasureGroupMeasures in chapter 4, most of the rows returned currency or quantity formatted cells, one returned non-formatted cells, but there was no explanation as to why. Figure 1 below shows the query in question, and Figure 2 shows the results with the cells in question highlighted.



Figure 1



Figure 2


The answer seems to be due to how the cells within the cube space are designed, the format of the cells is given here. By adding some code to the query to get the cell properties



Figure 3


and double-clicking one of the non-formatted cells we can see that the format is null



Figure 4


This is covered as a topic earlier in the book, but it could have been reiterated here in my view, to clarify this seeming anomaly, and to empathise that earlier description.


Quibbles aside, Chapter 4 introduced these key concepts in a logical fashion, and generally did a good job explaining them.


Expressions were introduced in Chapter 5, the ability to add calculated members to the already powerful cube. As the authors state, this adds a whole new dimension (no pun intended) to the cube, and they rightly give it a thorough explanation, 30 pages on a single topic as against the 31 pages of Chapter 4 which covers many topics within sets.


I won’t cover the rest of the chapters here, Chapters 1- 5 covers Part I of the book, MDX Fundamentals. Parts II and III cover MDX Functions and MDX Applications. I will leave you to explore these yourself if you are minded to buy this book, suffice to say that I found them informative, relevant, and appropriate. They are less obviously helpful to me from an Excel perspective, but will help me as I work deeper with cubes in the future.


One thing that I think is difficult, and this is not aimed at this book but is a comment about cubes generally, is that it is very difficult to envisage the data, and see more than what the query returns (without creating other queries) – n-dimensions is just too hard. But we have another excellent tool available to help us, one that I made extensive use of whilst reading this book; that is our good friend Excel. Using Excel and pivot tables I was able to get a much more familiar peek into the cube, one that allowed me to double-check what the authors were saying, and also to give me wider context that helped me extend their quoted queries to delve even deeper.


Giving just one example of what I mean by this, in Chapter 4 there is discussion on querying the members of a hierarchy or hierarchy level. One of the key concepts covered was the difference in querying the members of an attribute hierarchy at the leaf level as against all levels. The query of all levels is shown in Figure 5 below





Figure 5


which returns the data set shown in Figure 6.





Figure 6


This is difficult, at least to me, because there are various levels of data here. ‘Bike Wash – Dissolver’ is a product within the ‘Cleaners’ sub-category, which is within the ‘Accessories’ category, no clear demarcation in the dataset as returned. Figure 7 shows the same query results with some manual highlighting of the groups




Figure 7


By creating a simple pivot table in Excel, Figure 8, I was able to view this same data in a much more readable manner, which helped me to understand the points being made.



Figure 8


Here you can see that Helmets is a sub-category of Accessories very clearly, and of course you have the filtering options (which I used to restrict the calendar dates as per the set SELECTion in the MDX, and geography to the United States as in the MDX WHERE clause), and grouping.


I found creating simple pivots extremely useful in working through the examples.


 


In summary, I found this book well structured, and that it gave me a good understanding of MDX and how to use that within SQL Server 2008 Management Studio. As explained above, it does not explicitly cover using MDX within Excel or VBA, two areas of interest to me, but I gained more than enough information to be able to apply that to my existing Excel and VBA knowledge, and feel far better equipped having read this book.