Self Service Is Cheap?

As I am sure that you are all aware, PowerPivot is the best thing to hit Excel since PED. Although I am not a big data cruncher, I have been playing with PowerPivot; to keep abreast of an important technology; it plays into one of my primary interests of Business Intelligence; and partly to seek out further opportunities.

Although I have been scouring the blogs and dedicated websites, I do like to have a book as a reference point, and so I was very pleased to get a review copy of PowerPivot for the Data Analyst: Microsoft Excel 2010, by Bill Jelen.

Bill Jelen is well known in the Excel community; he is the founder of MrExcel which is probably the most active Excel forum on the web; and is a prolific author and Excel presenter. It is good having an Excel PowerPivot book from someone whose primary focus is Excel, too many (including MS) see Excel as a presentation layer for SQL Server and SharePoint, and fail to acknowledge Excel’s capability and potential. Bill is unlikely to make that mistake.

Bill has a flowing writing style, and reading the book it is easy to imagine him at the front of a room giving one of his Excel presentations. Having access to Rob Collie, who has worked for Microsoft in this technology, was a lucky break for Bill, someone who could check his facts and feed him other ideas.

The book isn’t quite up to its subject in my view. It looks as though it was rushed to market to ride the launch hype, and suffers for that, with errors and what I see as less than relevant content; this feels like padding to me. There are basic typographical errors (the worst is when a trick to get month names to list in month name order is attributed to Colin Banfield, but his name is incorrectly spelt).

Too much of the content appears as if it has been lifted from some of Bill’s many other books. Much of the book is spent on ancillary topics in my view; there is more on standard pivot table usage than is required, such as the PivotTable trick that has impressed some. I have known the trick for some years, and whilst I acknowledge that it is a good trick, I have never found a use for it other than demonstrating it in conferences and training sessions, but most importantly, I cannot see the relevance to this book.

There is a section on basic DAX functions which covers 30 pages. Whilst these functions might be available within DAX, they are the same as the Excel functions and really did not need covering in depth here. I would have preferred more on DAX usage. There is a chapter on DAX measures, which is a good start, but after saying that a whole book could be written about Time Intelligence functions, we get a mere 10 pages. DAX is a real key feature that boosts PowerPivot, and can make them so much more powerful than standard PivotTables.

PowerPivot for the Data Analyst: Microsoft Excel 2010 is very easy to read, and whilst Bill is clearly a PowerPivot fan, he is not blind to some of its foibles and failures, and presents a good section on the pros and cons of PowerPivot.

Overall, as I mentioned, this book looks and feels rushed to me. Although it is quite a good start, I would not recommend it to anyone other than a novice Excel/PowerPivot user. It may be useful for someone that needs to gain a basic understanding of PowerPivot, but I feel there must be a better book out there, or on the production line. I notice that Marco Russo and Alberto Ferrari have recently released an Excel PowerPivot book, perhaps they would like to send me a copy to review?.

One other comment, not aimed specifically at this book, but at this market generally. In an age where we are all using GUI based software, and colour is used extensively within applications to help navigation and usage, publishers should not be printing the pictures in black and white, effectiveness is lost by the lack of colour.

VBA Has No Class

Recently, I was working on one of my apps, one that is database centric. Whilst making some changes, I came across this piece of code that inserts a new record into the database

    With RS

        .Fields(FIELD_AUDIT_DATE) = sh.Cells(Rownum, COL_FU_AUDIT_DATE).Value
        .Fields(FIELD_CONSULTANT_ID) = GenerateID(FIELD_CONSULTANT_ID, Designer)
        .Fields(FIELD_SALES_TYPE_ID) = GenerateID(FIELD_SALES_TYPE_ID, sh.Cells(Rownum, COL_FU_SALES_TYPE).Value)
        .Fields(FIELD_CUSTOMER) = sh.Cells(Rownum, COL_FU_CUSTOMER).Value
        .Fields(FIELD_QUOTE_NUMBER) = QuoteNumber
        .Fields(FIELD_PRODUCT_ID) = GenerateID(FIELD_PRODUCT_ID, sh.Cells(Rownum, COL_FU_PRODUCT).Value)
        .Fields(FIELD_QUOTE_DATE) = sh.Cells(Rownum, COL_FU_QUOTE_DATE).Value
        .Fields(FIELD_QUOTE_AMOUNT) = sh.Cells(Rownum, COL_FU_QUOTE_AMOUNT).Value
        .Fields(FIELD_ESTIMATED_PROFIT) = sh.Cells(Rownum, COL_FU_PROFIT).Value
        .Fields(FIELD_REVISED_QUOTE) = sh.Cells(Rownum, COL_FU_REVISED_QUOTE).Value
        .Fields(FIELD_ACTUAL_PROFIT) = sh.Cells(Rownum, COL_FU_ACTUAL_PROFIT).Value
        .Fields(FIELD_STATUS_ID) = GenerateID(FIELD_STATUS_ID, sh.Cells(Rownum, COL_FU_QUOTE_STATUS).Value)
        .Fields(FIELD_DECLINED_ID) = GenerateID(FIELD_DECLINED_ID, sh.Cells(Rownum, COL_FU_DECLINED).Value)
        .Fields(FIELD_QUOTE_NOTES) = sh.Cells(Rownum, COL_FU_NOTES).Value
        .Fields(FIELD_UPDATED_BY) = ThisApp.LogonUser
    End With

Whilst looking at this code, for some reason my mind wandered to thinking about disconnected recordsets. Whilst most of my application involve database access, it is no longer on enterprise databases, so I don’t have the connection issues of high-end systems. As such, my use of disconnected recordsets is infrequent, my apps are safe in creating a user connection at logon, maintaining the connection throughout their session, and dropping at the end.

As an aside, I always use ADO in my applications. I found it easy to use and it performs fine for me. I have frequently been told that DAO performs better, usually by old Access’ers, but I have no issues with ADO, and will continue with it.

As often happens, my mind started wandering over this topic, forgetting what I was doing and thinking more about disconnected recordsets. I roamed on to thinking about collection classes. Collection classes are a very useful way of creating an in-memory dataset that can be manipulated by creating class methods, but they do require rather a lot of setup. In my musings, it occurred to me that I could use disconnected recordsets to achieve the same functionality, and use the builtin recordset functionality rather than creating my own methods.

Usually, a recordset would connect to a data source at some point, even a disconnected recordset, if only to get the data and/or write it back. It occurrs to me that this is not an absolute necessity, a recordset can be created and used without ever connecting to a data source, for instance where the data is maintained on a spreadsheet.

In this discussion, I will be working with a simple dataset as shown in Figure 1. 

Figure 1

In these examples, I am using late-binding, so I first declare some constants to emulate the ADO constants.

Enum ADOConstants
    adOpenStatic = 3
    adUseClient = 3
    adVarChar = 200
End Enum

Next, the data is extracted from the worksheet, and stored in an array.

Dim RSUsers As Object
Dim vecUsers As Variant
Dim Lastrow As Long
Dim i As Long, j As Long

    With ActiveSheet

        Lastrow = .Cells(.Rows.Count, "A").End(xlUp).Row
        vecUsers = .Range("A1").Resize(Lastrow, 5)
    End With

The recordset is created, and because we are not retrieving the data from a data source (using ADO), we need to initialise the recordset by creating the columns.

    Set RSUsers = CreateObject("ADODB.Recordset")
    With RSUsers

        .Fields.append "FirstName", adVarChar, 25
        .Fields.append "LastName", adVarChar, 25
        .Fields.append "Gender", adVarChar, 1
        .Fields.append "Role", adVarChar, 25
        .Fields.append "Location", adVarChar, 25

        .CursorLocation = adUseClient
        .CursorType = adOpenStatic


Note that you have to use a client side cursor.

Now that we have a defined recordset, we can load it with data. Each row of the array is iterated and a new record is added to the recordset, the fields are populated, and the recordset is updated.

    For i = 2 To Lastrow

        .Fields("FirstName") = vecUsers(i, 1)
        .Fields("LastName") = vecUsers(i, 2)
        .Fields("Gender") = vecUsers(i, 3)
        .Fields("Role") = vecUsers(i, 4)
        .Fields("Location") = vecUsers(i, 5)
    Next i


The recorsdet is now fully populated, contains all of the data, and so it is ready to use. To demonstrate this, I created a simply display function that outputs the recordset contents.

Private Function DisplayDetails( _
    ByVal RS As Object, _
    ByVal Title As String)
Dim msg As String
Dim i As Long
    With RS

        Do Until RS.EOF
            msg = msg & "Name: " & RS.Fields("FirstName").Value & " "
            msg = msg & RS.Fields("LastName").Value & vbNewLine
            msg = msg & vbTab & "Role: " & RS.Fields("Role").Value
            msg = msg & "(" & IIf(RS.Fields("Gender").Value = "M", "Male", "Female") & ")" & vbNewLine
            msg = msg & vbTab & "Location: "
            msg = msg & RS.Fields("Location").Value & vbNewLine
    End With

    MsgBox msg, vbOKOnly + vbInformation, Title
End Function

The first demo shows all of the rows, all of the columns.

    Call DisplayDetails(RSUsers, "List All Users")

We can also use the recordset Find function to find the rows that match the specified criteria. The criteria is based upon a column name.

    RSUsers.Find = "Gender = 'M'"
    Call DisplayDetails(RSUsers, "List All Users")

Find also can be used to skip rows, specify a start point, or set the search direction.

The recordset can be sorted,

    RSUsers.Sort = "Location"
    Call DisplayDetails(RSUsers, "List All Users Sorted By Location")

… or filter it.

    RSUsers.Filter = "Gender = 'M'"
    Call DisplayDetails(RSUsers, "List All Male Users")

You might wonder what is the difference between Find and Filter. Filter allows for multiple criteria, unlike Find.

    RSUsers.Filter = "Gender = 'M' AND Location Like 'P*'"
    Call DisplayDetails(RSUsers, "List All Male Users in P*")

As you can see, the disconnected recordset can do everything a collection class can do, but without having to hand-roll any of the methods, recordset has them builtin.

I think there is a killer usage for disconnected recordsets in Excel VBA apps, I just haven’t thought of it yet.

Cracking The Code

Excel’s ability to run pivot tables against an OLAP cube in Analysis Services is a truly wondrous beast. The ease of use; volumes of data; the filtering; all of these provide a rich environment for users to easily inspect their data and get real information out. Add to this that Microsoft seem to believe that pivot tables are a winner and keep improving it, access to cubes and slicers being some of the latest significant changes, I can see no reason for all serious Excel users not to get to grips with this technology.

In addition to all of this goodness, it is possible to build a table query against the cube using the cube formulae. Sometimes this might be the appropriate way to present your information. But cube formulae are hard, the syntax is pure MDX, and long-winded. For example, using the MDX cube from the SQL Server MDX Step By step book that I used in my last post, a cube formula to get the Reseller Sales Amounts for Cleaner products in 2003 within the US is

=CUBEVALUE(“MDX SBS SBS”,”[Date].[Calendar].[Calendar Year].[CY 2003]”,”[Product].[Product Categories].[Subcategory].[Cleaners]”,”[Measures].[Reseller Sales Amount]”,”[Geography].[Geography].[Country].&[United States]”)

Phew! That is hard work. Imagine repeating that for say years by product category, maybe a 5×20 table of values.

I tend to create simple macros that add lists of available members to a dropdown in the ribbon, a user can pick a business oriented item from this list, and it updates the activecell formula to add the appropriate MDX. This way, a user has a simple way to build the cube formulae, and will see the MDX so they can learn if they are so inclined.

In addition, we can be smarter and simplify the formulae, because each row would use the same product sub-category, each column would use the same calendar year. We can use the CUBEMEMBER function to get the row and column heading values, such as

=CUBEMEMBER(“MDX SBS SBS”,”[Date].[Calendar].[Calendar Year].[CY 2003]”)

and use that within the value formula,

=CUBEVALUE(“MDX SBS SBS”,I$5,”[Product].[Product Categories].[Subcategory].[Cleaners]”,”[Measures].[Reseller Sales Amount]”,”[Geography].[Geography].[Country].&[United States]”)

Similarly, we can use CUBEMEMBER to get the Cleaners member

=CUBEMEMBER(“MDX SBS SBS”,”[Product].[Product Categories].[Subcategory].[Cleaners]”)

and use that within the value formula,

=CUBEVALUE(“MDX SBS SBS”,I$5,$H6,”[Measures].[Reseller Sales Amount]”,”[Geography].[Geography].[Country].&[United States]”)

Finally, the US member

=CUBEMEMBER(“MDX SBS SBS”,”[Geography].[Geography].[Country].&[United States]”)

and use that within the value formula,

=CUBEVALUE(“MDX SBS SBS”,I$5,$H6,”[Measures].[Reseller Sales Amount]”,$H$3)

So far, so good, but how are we supposed to know what those members are, and how to get the full MDX syntax of the member items? Of course, if we know the cube structure intimately and know MDX syntax thoroughly, we have no problems, but most users are not as informed.

Yet again, Excel comes to our rescue, in that it will tell us the cube formulae.

If we take the pivot table that we showed in the last post again, a table of product sub-categories over calendar years for a particular region, Figure 1, which just happens to match up to the example above, we can demonstrate how

Figure 1

If we go to the PivotTable Tools context ribbon, and on the Options tab, Tools group there is an OLAP Tools dropdown. The final item on this dropdown is ‘Convert to Formulas’, Figure 2, which will change the pivot table to a table of cube formulae.

Figure 2

When you select this option, you will get the dialog shown in Figure 3 which allows you to keep any report filters or convert them.

Figure 3

In this example, I retained the report filters, and the result was a table as shown in Figure 4.

Figure 4

If we examine this table, we can see the cube formulae that generate the table, Figure 5 highlighted shows a particular cube member, and the MDX that is required. This could be the end result, we might just use the table as presented, or it can be used as a learning tool, to familiarise with the cube and the MDX syntax.

Figure 5

Whilst this is a very handy trick for getting to understand the MDX required for generating the cube formulae, it is on an individual cell basis. The pivot table itself does not pull all of this data back on a cell by cell basis, it issues a far more complex MDX set query that pulls back all of the data specified in your pivot table. To see this code would be very useful in gaining better MDX understanding, so how can we do this?

There is nothing built into Excel to provide this, but there is a free addin at CodePlex called OLAP PivotTable Extensions which adds a pivot table context menu item. One of the extensions is to show the MDX code. If you right click within the pivot table, you should see the new context menu item, as in Figure 6.

Figure 6

This displays the following dialog, Figure 7, where you can see the MDX code on the MDX tab.

Figure 7

As you can see at first sight, this is quite scary, but most of it is a list of cells being returned, such as [Product].[Product Categories].[Product].[List Price]. Cut and paste it to a text editor and you should be able to format it and make it more readable, helping to understand what it is doing.

One thing to note in this query, as in most pivot table queries off a cube, is that is uses the NON EMPTY clause in the selects. What this effectively does is to outsort any of the member combinations along a particular axis that have nothing but empty cells. At first sight this seems like a good idea, don’t clog up the query with a whole bunch of rows where there is no data. But there is a potential downside to this. What if you want to see some of those empty rows? As in the case where some of your sales team might have targets and some don’t (for instance, customer salesmen as against store salesmen). In this case, you would want to see their empty rows because you would want to know that they are not hitting those targets, but you would not want to see the empty rows for the store sales staff. In an MDX query in SMS you would use the Exists function instead of NON EMPTY, I have yet to find out how to do the same in a pivot table.