Category Archives: 14765

Cooking With CUBEs



As we should all know by now, PowerPivot provides Excel with a powerful way to harness data from one or more sources, and to do further analysis on that data within familiar pivot tables.


Furthermore, because PowerPivot is creating an in-memory cube of the data, it is possible to build an analysis using CUBE formulae. I have blogged a couple of times about CUBE formulae, in Cycling Through The Fog and in Cracking The Code.


In Excel, as with any development, you want your solution to be as flexible and dynamic as possible. This blog is about building dynamic tables using CUBE formulae, but to start with the following formula shows an example of a value extracted from a PowerPivot model using CUBE functions


=CUBEVALUE(“PowerPivot Data”,
CUBEMEMBER(“PowerPivot Data”,”[Measures].[Sum of SalesAmount]”),
CUBEMEMBER(“PowerPivot Data”, “[DimProductCategory].[EnglishProductCategoryName].&[Bikes]”),
CUBEMEMBER(“PowerPivot Data”,”[DimDate].[FiscalYear].&[2006]”))


Equation 1


This formula gets the Sales Amount from the PowerPivot cube for the Bikes product category, for the fiscal year 2006. There will be many values at this intersection, there can be many dates in 2006 and many products within that category, all pre-aggregated in the cube; the CUBEVALUE function returns that aggregate amount


We could build the whole table of values using similar formulae. In our table we need to know what the value is related to, so we have row and column headers that identify the intersection points. We could define those headers using the CUBEMEMBER functions giving a table such as shown in Figure 1 below, which shows a table based on AdventureWorks.



Figure 1 – Table of values over year and product category


The formulas for the headings are


=CUBEMEMBER(“PowerPivot Data”,”[DimDate].[FiscalYear].&[2006]”)
=CUBEMEMBER(“PowerPivot Data”,”[DimDate].[FiscalYear].&[2007]”)
etc. for the column headings,


Equation 2


and


=CUBEMEMBER(“PowerPivot Data”, “[DimProductCategory].[EnglishProductCategoryname].&[Accessories]”)
=CUBEMEMBER(“PowerPivot Data”, “[DimProductCategory].[EnglishProductCategoryname].&[Bikes]”)

etc. for the row headings.


The values at the intersection points simply use these heading cells like so


=CUBEVALUE(“PowerPivot Data”,”[Measures].[Sum of SalesAmount]”,$A3,B$1)


This is equivalent to the formula given in Equation 1.


Slicing the Vegetables


Further richness is bestowed upon us because we can also link slicers to our table, giving us the sort of filtering we have with the pivot tables. For example, Figure 2 shows the same data table built using CUBE formulae with a fiscal year slicer; the data reflecting the fact that only the years 2006, 2007, and 2008 have been selected.



Figure 2 – Table of values reflecting years slicer selections


Showing the slicer selections on your report has been covered elsewhere, but it is so useful and asked so often that I thought I would also cover. I also have a couple of variations that I haven’t seen elsewhere, which are worth presenting.


The Menu


Previously, as shown in the formulae in Equation 2, we built the row and column headers using hard-coded values for the year and category fields. We need to be more dynamic in how we list these values. To show the slicer selections as in E5, F5, etc., we need a list of values from which we can choose and display the individual ordered items. The CUBESET function gives us this. The syntax for CUBESET is


CUBESET(connection. set_expression, , [sort_order], [sort_by])


where connection is the cube, set_expression is the set of values required, and caption is a value to display. So, looking at cell D1 we have the formula


=CUBESET(“PowerPivot Data”,Slicer_FiscalYear,”Set of Years


Equation 3


which would look as shown in Figure 3 when added to cell D1 to build our set of fiscal years.



Figure 3 – Slicer years set formula


As can be seen, we use Slicer_FiscalYear as the set_expression, so the set will include all selected values in that slicer, with the caption signifying the cell contents.


So far, so good, but we still need to list those selected values. For this, we use the CUBERANKEDMEMBER function, which returns the nth, or ranked, member in a set. The syntax of this


=CUBERANKEDMEMBER(connection, set_expression, rank, )


where connection is the cube as before, set_expression is the set of values to choose from, and rank is nth value. So, to get the first member, we use


=CUBERANKEDMEMBER(“PowerPivot Data”,$D$1,1)


for the second


=CUBERANKEDMEMBER(“PowerPivot Data”,$D$1,2)


and so on.


Because there are 5 years in the PowerPivot model, and when filtered in the slicer we might be showing less than 5, we need to cater for a variable number of items. The simplest way is just to add an error wrapper around the formula,


=IFERROR(CUBERANKEDMEMBER(“PowerPivot Data”,$D$1,1),””)


=IFERROR(CUBERANKEDMEMBER(“PowerPivot Data”,$D$1,2),””)


etc.


Equation 4


Why Extra Ingredients?


We could nest the CUBESET function within the CUBERANKEDMEMBER,but that would mean that a set is evaluated 5 times. By defining the set in its own cell and referring to that cell within the CUBERANKEDMEMBER function, it is evaluated just the once. A small matter, but it makes the spreadsheet easier to maintain, and is more efficient.


Cooked To Perfection


It’s as simple as that.


But hang on a minute, have we overcooked it?


Looking at the syntax definition for these two functions, we can see that they both take set_expression as an argument. The CUBESET function is passed the slicer values as its set, and in turned is passed to the CUBERANKEDMEMBER function as its set.


As the slicer values is a set_expression, you would think that we should be able to pass the slicer values directly to CUBERANKEDMEMBER as a set and be done with. And so we can, these formulae


=IFERROR(CUBERANKEDMEMBER(“PowerPivot Data”,Slicer_FiscalYear,1),””)


=IFERROR(CUBERANKEDMEMBER(“PowerPivot Data”,Slicer_FiscalYear,2),””)


etc.


Equation 5


work equally as well as those formulae in Equation 4


Managing The Ingredients


The list of product categories can also be listed in a similar way using CUBESET and CUBERANKEDMEMBER. Here we do need CUBESET as there is no pre-defined set of values as we had with the fiscal year that we can pick up. The set will be all values for the Product Category English name in the Product Category table,


=CUBESET(“PowerPivot Data”,
“[DimProductCategory].[EnglishProductCategoryName].Children”,
“Set of Categories”)


Equation 6


As can be seen, .Children gets us all of the category values.


One thing to note is the use of the caption argument. Again, this helps to highlight the cell containing the set.


We now have formulae that can define our full table, such as


D1: the formula in Equation 3
=CUBESET(“PowerPivot Data”,Slicer_FiscalYear,”Set of Years”)


D2: the formula in Equation 6
=CUBESET(“PowerPivot Data”,
“[DimProductCategory].[EnglishProductCategoryName].Children”,
“Set of Categories”)


E5:I5: the formulae in Equation 5
=IFERROR(CUBERANKEDMEMBER(“PowerPivot Data”,Slicer_FiscalYear,1),””)
=IFERROR(CUBERANKEDMEMBER(“PowerPivot Data”,Slicer_FiscalYear,2),””)

etc.,


D6:D9: formulae for the product categories
=IFERROR(CUBERANKEDMEMBER(“PowerPivot Data”,$D$2,1),””)
=IFERROR(CUBERANKEDMEMBER(“PowerPivot Data”,$D$2,2),””)
=IFERROR(CUBERANKEDMEMBER(“PowerPivot Data”,$D$2,1),””)
=IFERROR(CUBERANKEDMEMBER(“PowerPivot Data”,$D$2,2),””)


And finally, in E6:I9, the formulae for the values
=IFERROR(CUBEVALUE(“PowerPivot Data”,”[Measures].[Sum of SalesAmount]”,$D6,E$5),””)
etc., each cell reflecting the correct product category ($D6) and fiscal year (E$5).


Our table now has a full set of values, and reflects the choices made in the fiscal year slicer.


(If we wished, we could add the product categories to a slicer, and make our table dynamically reflect that.)


Ready to Serve?


Although we have been diligent in storing the evaluated sets in one place rather than nest a CUBESET function within the CUBERANKEDMEMBER function, but there are still a number of things going on here that I just don’t like:


  • The connection is hard-coded, multiple times
  • If a new year is added to the data, just copying cell I5 to J5 won’t work because the rank is hard-coded in the formula, it will need a small change
  • If no selection is made in the fiscal year slicer, the values shown are the total of all years, with a header value of All, as shown in Figure 4. This may be what is required in some instances, in others we may want to show each year’s values individually.


Figure 4 – Showing all years as a total with no slicer selections


These ‘difficulties’ can be overcome relatively easily.


Rather than hard-code the connection within each formula, put the connection text ‘PowerPivot data’ (without the quotes), in a cell, say D3, and assign an Excel name _cube, then use that defined name within the formulae. Note that this also makes the transition to Excel 2013 simpler, where the connection has now changed to ‘ThisWorkbookDataModel’ (again, without the quotes).


The rank is managed by using a function that returns a variable number depending upon the row or column of the cell, namely ROW([reference]) or COLUMN([reference]). You might think that you can use COLUMN(A1) in E5, and copy that across so that it updates to COLUMN(B1), COLUMN(C1), etc. Believe me, this is a very bad idea. Although everything will be fine at first, what happens if you decide to insert a column before column E? The answer is that COLUMN(A1) will update to COLUMN(B1) and whereas the first column of year table originally reflected the first selected year in the fiscal year slicer, it will now reflect the second. You might say that you would never do that, but no-one ever does until they do. For the sake of a simple change it is hardly worth risking it.


The suggested change is to use COLUMN()-COLUMN($D$5), which uses the top left cell of our table as an anchor point. Thus, a formula in cell E5 using these functions will return 1 for that calculation, and so on. If a column is inserted to the left of the table, those parts of formulae will update to COLUMN()-COLUMN($E$5), which means the formula that was in cell E5 which has now moved to cell F5 still return 1 for that calculation.


Similarly, the category list will use ROW()-ROW($D$5).


Finally, how can we show each year in the column headers and the vales for those years when no slicer selection is made, rather than showing ‘All’ and totals for all years? We already have the formula in D1 that gets the set of selected slicer years, that is


=CUBESET(_cube,Slicer_FiscalYear,”Slicer Years”)


As we showed before, we can get a set of all year regardless of slicer selection with the CUBESET function and the member’s children property, as in


CUBESET(_cube,”[DimDate].[FiscalYear].Children”,”Dimension Years”)


But how do we know when to use which? One way would be to test whether the first member of this slicer set returns All. If it does, there are no slicer selections so we show all years individually, if not we show the slicer selected years. We can check the first slicer set value with the following


IF(CUBERANKEDMEMBER(_cube,Slicer_FiscalYear,1)<>”All”, …


Adding all three elements together, we have the following formula in D1 that determines what goes into the set of years that will drive the table column headings


=IF(CUBERANKEDMEMBER(_cube,Slicer_FiscalYear,1)<>”All”,
CUBESET(_cube,Slicer_FiscalYear,”Slicer Years”),
CUBESET(_cube,”[DimDate].[FiscalYear].Children”,”Dimension Years”))


With this formula to get the years set, we can see all the years listed when no selections are made on the ribbon rather than showing all year totals, as in Figure 5.



Figure 5 – Showing all years with no slicer selections


What’s For Dessert?


That’s about it. Using this technique we have a table that shows the value by year by product category, with a slicer for selecting specific years which is reflected in the years shown in the table. The years and product categories are dynamically built and so can accommodate extra years and extra categories in the source data, and the years can also handle a full slicer set without showing the values as totals for all years.

Formulas Made Easy




I have been playing with PowerPivot recently, and one of the sites I have visited is Rob Collie’s PowerPivotPro blog, http://powerpivotpro.com/. One particular post that caught my eye was a guest post by Colin Banfield, on building complex DAX formulas in NotePad++. As Colin noted, the PowerPivot formula editor is severely limited, a better option is required when working with more complex DAX formulae. As Colin suggested, NotePad++ can provide this better option; with code folding, keyword highlighting, and Intellisense. In his post, Colin showed how to extend NotePad++ to provide this functionality for DAX formulas.


Being primarily an Excel guy, I couldn’t help but empathise, I feel exactly the same about creating complex Excel formulas. Sure, it is possible to break formulas over several lines by inserting Alt-Enter into the string, and Excel 2007 and 2010 has improved matters somewhat with the resizable formula bar, but these are tweaks around the edges, the core problem remains (the resizable formula bar is a good idea in concept, but in practice it takes up far too much real-estate, which when added to the space gobbler that is the ribbon, makes it far from ideal).


So, inspired by Colin’s post, I set about to extend NotePad++ to support Excel formulas. Believe me, with the huge number of functions, creating tooltips and argument lists, this was no trivial task. I was amazed at the number of functions I hadn’t come across before, besides those that I never use.


Assuming you have NotePad++ to hand, this is how to extend NotePad++ for Excel. This duplicates much of what Colin has written, but it is worthwhile repeating for clarity.


Notepad++ saves language definitions in an XML User Defined Language file (UDL), you can start the XML User Defined Language editor from the menu (View -> User Define Dialog…), or from the toolbar button.


The first thing to do is to give your UDL an identifier, I call mine XLF here, and to specify the extension of any files to use this UDL, again mine is xlf.


The first tab allows setting up the folder options. An Excel formula has no obvious fold points, so I have invented two keywords, BBEGIN and BEND, that  I embed in the formula simply to provide the fold points. Clearly, these need to be removed before copy-pasting the formula back to Excel. Figure 1 shows my folder tab settings.



Figure 1.


The second tab is used to define the keywords, and the highlight colour. Figure 2 shows my Keyword settings. You can see that even though I have used all four keyword sets, they have the same attributes. This is because I used an Excel spreadsheet to generate this data (surprise!), and I grouped them in standard functions, 2007 new functions, 2010 new functions, and cube functions.  



Figure 2.


Next, we have in Figure 3 the tab for defining comments. Unfortunately, an Excel formula doesn’t support comments, so I use this just to add some preliminary comments if I want to save the formula in a file.



Figure 3.


Finally, figure 4 shows the operator settings.



Figure 4.


So far we have defined the UDL for our particular edit type, and when XLF is selected from the Language menu, any input will be formatted according to these settings. Figure 5. Shows a ‘typical’ Excel formula created in Notepad++.



Figure 5.


Where it gets really good is by adding intellisense, description and argument tooltips. To do this, you have to create an XML file with the same name as your chosen language id, that is XLF.xml in this case. This file is stored in the NotePad++ file directory, in Notepad++\plugins\APIs folder. The format of a typical function is shown in Figure 6.


<KeyWord name="AVERAGEIFS" func="yes">
  <Overload retVal="number" descr="Returns the average (arithmetic mean) for the cells specified by a given set of conditions or criteria. Excel 2007 and later">
    <Param name="average_range"/>
    <Param name="criteria_range1"/>
    <Param name="criteria1"/>
    <Param name="[criteria_range2]"/>
    <Param name="[criteria2]"/>
    <Param name="…"/>
  </Overload>
</KeyWord> 

Figure 6.


After creating all of the xml code (auto-generated in Excel for me), and building the XLF.XML file, when inserting a function in NotePad++ you get the following assistance, as shown in figures 7 and 8.


 



Figure 7.



Figure 8.


Update


Notepad++ has to be enabled to allow auto-complete and function tips. To do this, go to menu Settings>Preferences>Backup/Auto-Completion and check the Enable auto-completion and Function Parameters hint boxes, as shown in Figure 9. 



Figure 9.


To save you the chore of recreating all of this data, I have provided a zip file containing the XLF.xml file, and the exported UDL data. The XML file has to be stored in the Notepad++\plugins\APIs folder, and the UDL file can be imported within the UDL.