Does Excel Do MDX?




One of my primary interests at present is Business Intelligence (BI), using available tools to present meaningful information to the business that actually adds value.


I won’t go into what BI is, the quality of BI products available at present, or the how businesses perceive or fails to perceive BI, that may be a topic for another day, but rather in developing my skills and capabilities in this area. Excel is a wonderful medium for presenting this information, and the majority users that I know would rather use Excel than other tools.


Most people who are familiar with BI will have heard of OLAP cubes, a technology that allows manipulating and analyzing data from multiple perspectives, and provides fast analysis of data in a way that a relational database just cannot match. In Excel, it is very simple to build a pivot table that can plug into an OLAP cube, allowing retrieval of large, complex of data sets easily and quickly.


I have built many applications that utilise pivot tables against cubes, and have also started to build more analyses using the newly added cube functions in Excel. The basic format of a typical cube function, say CUBEVALUE, is


CUBEVALUE(connection, member_expression1,member_expression2…)


with an example formula being


=CUBEVALUE(“MDX Step-By-Step Chapter 3 Cube”,”[Product].[Category].[Bikes]”,”[Measures].[Reseller Sales Amount]”)


where a member_expression, such as “[Product].[Category].[Bikes]”,”[Measures].[Reseller Sales Amount]”)is a text string of a multidimensional expression (MDX) that evaluates to a member or tuple within the cube. This is not the most intuitive syntax I have come across. There are various ways to find these tuples, which we will not go into here but maybe a subject for a later post, but I felt it would be a good idea to get better acquainted with MDX. By getting a better understanding of MDX I fell I would be able to query cubes more effectively, from within SQL Server 2008 Management Studio (SMS), or even within Excel.


To this end, I bought myself SQL Server 2008 MDX Step by Step, a book from Microsoft Press, and the following is my view of this book (yes, I did pay for this book, it is not a free review copy :)).


One thing to explain here, especially for those of you who know this primarily as an Excel blog, is that this book is primarily aimed at developers working in Analysis Services and makes no mention of Excel that I can recall, and all the examples are shown within SMS.


Before I make any specific comments, I will state that I had used MDX prior to reading this book, but felt a bit at sea. It may that as this was my first book on the subject that it found me as a soft target, but I liked the book, and felt it was just about worth my outlay. The comments below should be taken with this overall view in mind.


The book starts gently enough, guiding you through an overview of BI and MDX, introducing the MDX Query Editor in SMS, and explaining the concept of tuple.


Chapter 3 was the first where any real MDX code was deployed, using a simplified cube in order to focus on the points being made. As well as tuples, it introduced attribute and user hierarchies, which are key cube fundamentals to understand.


Chapter 4 was the first where the material became more than just simple explanation and more detailed concepts were introduced. I think that in too many places the authors did a less than satisfactory job of explaining what was going was going on; too often things were stated without an explanation as to why, and the rationale for techniques was not explained. This is especially important in this chapter, the first where we meet these less obvious concepts and where that clear explanation is vital. For example,


- on p67 the Distinct function is introduced, and explains how it can be used to remove any duplicate rows (or columns) that the query might generate. The thought occurred to me that maybe you just add that to every set statement to make sure there are no duplicates, but of course I am sure it comes at a high cost in performance. Nothing was mentioned as to the pros and cons. Just stating what the function does should not be the objective of a book in my view, that is what help is or, it should look to guide you through when and where to use them, best practices, and so on


- when discussing MeasureGroupMeasures in chapter 4, most of the rows returned currency or quantity formatted cells, one returned non-formatted cells, but there was no explanation as to why. Figure 1 below shows the query in question, and Figure 2 shows the results with the cells in question highlighted.



Figure 1



Figure 2


The answer seems to be due to how the cells within the cube space are designed, the format of the cells is given here. By adding some code to the query to get the cell properties



Figure 3


and double-clicking one of the non-formatted cells we can see that the format is null



Figure 4


This is covered as a topic earlier in the book, but it could have been reiterated here in my view, to clarify this seeming anomaly, and to empathise that earlier description.


Quibbles aside, Chapter 4 introduced these key concepts in a logical fashion, and generally did a good job explaining them.


Expressions were introduced in Chapter 5, the ability to add calculated members to the already powerful cube. As the authors state, this adds a whole new dimension (no pun intended) to the cube, and they rightly give it a thorough explanation, 30 pages on a single topic as against the 31 pages of Chapter 4 which covers many topics within sets.


I won’t cover the rest of the chapters here, Chapters 1- 5 covers Part I of the book, MDX Fundamentals. Parts II and III cover MDX Functions and MDX Applications. I will leave you to explore these yourself if you are minded to buy this book, suffice to say that I found them informative, relevant, and appropriate. They are less obviously helpful to me from an Excel perspective, but will help me as I work deeper with cubes in the future.


One thing that I think is difficult, and this is not aimed at this book but is a comment about cubes generally, is that it is very difficult to envisage the data, and see more than what the query returns (without creating other queries) – n-dimensions is just too hard. But we have another excellent tool available to help us, one that I made extensive use of whilst reading this book; that is our good friend Excel. Using Excel and pivot tables I was able to get a much more familiar peek into the cube, one that allowed me to double-check what the authors were saying, and also to give me wider context that helped me extend their quoted queries to delve even deeper.


Giving just one example of what I mean by this, in Chapter 4 there is discussion on querying the members of a hierarchy or hierarchy level. One of the key concepts covered was the difference in querying the members of an attribute hierarchy at the leaf level as against all levels. The query of all levels is shown in Figure 5 below





Figure 5


which returns the data set shown in Figure 6.





Figure 6


This is difficult, at least to me, because there are various levels of data here. ‘Bike Wash – Dissolver’ is a product within the ‘Cleaners’ sub-category, which is within the ‘Accessories’ category, no clear demarcation in the dataset as returned. Figure 7 shows the same query results with some manual highlighting of the groups




Figure 7


By creating a simple pivot table in Excel, Figure 8, I was able to view this same data in a much more readable manner, which helped me to understand the points being made.



Figure 8


Here you can see that Helmets is a sub-category of Accessories very clearly, and of course you have the filtering options (which I used to restrict the calendar dates as per the set SELECTion in the MDX, and geography to the United States as in the MDX WHERE clause), and grouping.


I found creating simple pivots extremely useful in working through the examples.


 


In summary, I found this book well structured, and that it gave me a good understanding of MDX and how to use that within SQL Server 2008 Management Studio. As explained above, it does not explicitly cover using MDX within Excel or VBA, two areas of interest to me, but I gained more than enough information to be able to apply that to my existing Excel and VBA knowledge, and feel far better equipped having read this book.

Formulas Made Easy




I have been playing with PowerPivot recently, and one of the sites I have visited is Rob Collie’s PowerPivotPro blog, http://powerpivotpro.com/. One particular post that caught my eye was a guest post by Colin Banfield, on building complex DAX formulas in NotePad++. As Colin noted, the PowerPivot formula editor is severely limited, a better option is required when working with more complex DAX formulae. As Colin suggested, NotePad++ can provide this better option; with code folding, keyword highlighting, and Intellisense. In his post, Colin showed how to extend NotePad++ to provide this functionality for DAX formulas.


Being primarily an Excel guy, I couldn’t help but empathise, I feel exactly the same about creating complex Excel formulas. Sure, it is possible to break formulas over several lines by inserting Alt-Enter into the string, and Excel 2007 and 2010 has improved matters somewhat with the resizable formula bar, but these are tweaks around the edges, the core problem remains (the resizable formula bar is a good idea in concept, but in practice it takes up far too much real-estate, which when added to the space gobbler that is the ribbon, makes it far from ideal).


So, inspired by Colin’s post, I set about to extend NotePad++ to support Excel formulas. Believe me, with the huge number of functions, creating tooltips and argument lists, this was no trivial task. I was amazed at the number of functions I hadn’t come across before, besides those that I never use.


Assuming you have NotePad++ to hand, this is how to extend NotePad++ for Excel. This duplicates much of what Colin has written, but it is worthwhile repeating for clarity.


Notepad++ saves language definitions in an XML User Defined Language file (UDL), you can start the XML User Defined Language editor from the menu (View -> User Define Dialog…), or from the toolbar button.


The first thing to do is to give your UDL an identifier, I call mine XLF here, and to specify the extension of any files to use this UDL, again mine is xlf.


The first tab allows setting up the folder options. An Excel formula has no obvious fold points, so I have invented two keywords, BBEGIN and BEND, that  I embed in the formula simply to provide the fold points. Clearly, these need to be removed before copy-pasting the formula back to Excel. Figure 1 shows my folder tab settings.



Figure 1.


The second tab is used to define the keywords, and the highlight colour. Figure 2 shows my Keyword settings. You can see that even though I have used all four keyword sets, they have the same attributes. This is because I used an Excel spreadsheet to generate this data (surprise!), and I grouped them in standard functions, 2007 new functions, 2010 new functions, and cube functions.  



Figure 2.


Next, we have in Figure 3 the tab for defining comments. Unfortunately, an Excel formula doesn’t support comments, so I use this just to add some preliminary comments if I want to save the formula in a file.



Figure 3.


Finally, figure 4 shows the operator settings.



Figure 4.


So far we have defined the UDL for our particular edit type, and when XLF is selected from the Language menu, any input will be formatted according to these settings. Figure 5. Shows a ‘typical’ Excel formula created in Notepad++.



Figure 5.


Where it gets really good is by adding intellisense, description and argument tooltips. To do this, you have to create an XML file with the same name as your chosen language id, that is XLF.xml in this case. This file is stored in the NotePad++ file directory, in Notepad++\plugins\APIs folder. The format of a typical function is shown in Figure 6.


<KeyWord name="AVERAGEIFS" func="yes">
  <Overload retVal="number" descr="Returns the average (arithmetic mean) for the cells specified by a given set of conditions or criteria. Excel 2007 and later">
    <Param name="average_range"/>
    <Param name="criteria_range1"/>
    <Param name="criteria1"/>
    <Param name="[criteria_range2]"/>
    <Param name="[criteria2]"/>
    <Param name="…"/>
  </Overload>
</KeyWord> 

Figure 6.


After creating all of the xml code (auto-generated in Excel for me), and building the XLF.XML file, when inserting a function in NotePad++ you get the following assistance, as shown in figures 7 and 8.


 



Figure 7.



Figure 8.


Update


Notepad++ has to be enabled to allow auto-complete and function tips. To do this, go to menu Settings>Preferences>Backup/Auto-Completion and check the Enable auto-completion and Function Parameters hint boxes, as shown in Figure 9. 



Figure 9.


To save you the chore of recreating all of this data, I have provided a zip file containing the XLF.xml file, and the exported UDL data. The XML file has to be stored in the Notepad++\plugins\APIs folder, and the UDL file can be imported within the UDL.

The US Isn’t The Whole World?




Microsoft sneaked a couple of new functions into Excel 2010. These functions are WORKDAY.INTL and NETWORKDAYS.INTL.


They appear to be addressing the age-old question of handling different working day patterns around the globe. Whilst we have Saturday and Sunday off, some countries may have Friday and Saturday. And of course, when planning work, some poor souls only get a single day of the week off. These new functions provide a new optional argument, weekend, to specify which days are to be treated as weekend days.


This should remove those complex formula solutions to handle this particular problem.


Unfortunately, MS as ever seem to have over-engineered it.


The help describes the weekend argument thus

Weekend  Optional. Indicates the days of the week that are weekend days and are not considered working days. Weekend is a weekend number or string that specifies when weekends occur

which suggests that we specify what days are in our weekend. But how do we define them?

Later the help provides a table of weekend values,

weekend-number Weekend days
1 or omitted Saturday, Sunday
2 Sunday, Monday
3 Monday, Tuesday 
4 Tuesday, Wednesday 
5 Wednesday, Thursday
6 Thursday, Friday
7 Friday, Saturday 
11 Sunday only
12 Monday only
13 Tuesday only
12 Wednesday only
15 Thursday only
16 Friday only
17 Saturday only

and later its says …

Weekend string values are seven characters long and each character in the string represents a day of the week, starting with Monday. 1 represents a non-workday and 0 represents a workday. Only the characters 1 and 0 are permitted in the string. 1111111 is an invalid string.

They present a table of decimal values then tell me that Weekend is a string of day indicators. Okay, I admit, it did say earlier that weekend is a weekend number or string that specifies when weekends occur, but why the need for two ways of telling it which days? And where is Monday, Tuesday and Wednesday as a combination in the table of values?

So I started looking at this, and my first thought was that weekend was a binary string, and the table above was the decimal numbers associated with the binary string.

First, in the string they have 1 representing a non-working day and 0 or a working day. I appreciate they call it weekends, but that does strike me as very odd.

Then, the decimal table presented doesn’t map to the binary values. Even allowing for their upside down logic, Saturday and Sunday weekends, or working days of Mon, Tue, Wed, Thu and Fri would be 0000011 which is 3, not 1.

The table of values shown doesn’t show all possibilities, such as weekend days of Saturday, Sunday and Monday, but the weekend string suggests that it will support this.

I can’t figure out that table of values, so even if it does support more than 2 weekend days, it doesn’t seem possible to use a decimal value to use in such circumstances. If I enter =WORKDAY.INTL(–“2010-09-06″,5,8), it returns #NUM.

All in all, I cannot help but conclude that having two ways of representing the weekends is pointless, the decimal values are limited, and the weekend string representation is so much simpler to envisage.

I have played with it, and the weekend string seems to work. =WORKDAY.INTL(–“2010-09-06″,5,”0010011″) gave me Tue 14-Sep, which is Tuesday, Thursday, Friday, Monday and Tuesday past the Monday of 6th Sep. It caters nicely for holidays as well.