A review: "Learning SQL Server 2008 Reporting Services” by Jayaram Krishnaswamy

A while back I got asked if I’d review this book, but as they could only get me a copy in PDF format, it’s taken a while to get to it. Whilst I like having electronic copies of books, I generally prefer to read them in paper form, and just have the electronic copy for reference.

But anyway – this book calls itself “A step-by-step guide to getting the most of Microsoft SQL Server Reporting Services 2008”. At 536 pages, I remember my first thoughts being that it seemed awfully long for the level that it’s aimed at. However, as I went through it, I found that this was largely down to the sheer number of screenshots. This is great, and particularly for an eBook. It means that someone trying to learn about a particular screen can easily click the “Next Page” button until they see that screen.

As for whether or not I’d recommend this book to someone learning Reporting Services – well, it depends on your style of learning.

If you want something deeper, that explains more about the choices you’re making, and the paradigms you should be considering, then maybe this book isn’t for you.

But if you want to jump into SSRS, looking at example after example, to get a hands-on feel for the product, then this book is great. It feels like labs when you’re reading it, and that’s going to suit a lot of people.

Counting consecutive instances with DENSE_RANK

It’s hard to think of a practical use for DENSE_RANK(), but here’s one.

It’s one of the ranking functions that was introduced in SQL Server 2005, along with RANK(), NTILE(n) and the incredibly useful ROW_NUMBER(). But apart from producing reports, it’s quite hard to find genuine uses for some of them.

Yesterday I answered a question at StackOverflow (first time – I often answer questions at Experts Exchange and on the MSDN Forums, but never at StackOverflow before). It’s at http://stackoverflow.com/questions/1176011/sql-to-determine-minimum-sequential-days-of-access/1176255#1176255

The problem is trying to find runs of consecutive days of activity. I solved it using ROW_NUMBER() and the integer value of the day, grouping by the offset between them. When you partition the ROW_NUMBER() by the users, you have a complete solution. It works nicely.

But if people can have multiple entries in a day, then ROW_NUMBER() doesn’t cut it – the offset would change whenever someone didn’t have exactly one record per day. Of course, there might be logic in place to make sure this never happens, and I hope there would be – but if there isn’t the problem just becomes one for DENSE_RANK().

You see, DENSE_RANK() won’t go up for ties. If you have two entries on the same day, they’ll be given the same rank. And then the next day will be as if there had only been one the previous day. If you have ten 107ths, the next DENSE_RANK() is 108 (RANK() would give the next value as 117).

So then my query of:

with numberedrows as
(
        select row_number() over (partition by UserID order by CreationDate) – cast(CreationDate-0.5 as int) as TheOffset, CreationDate, UserID
        from tablename
)
select min(CreationDate), max(CreationDate), count(*) as NumConsecutiveDays, UserID
from numberedrows
group by UserID, TheOffset;

…becomes:

with numberedrows as
(
        select dense_rank() over (partition by UserID order by cast(CreationDate-0.5 as int)) – cast(CreationDate-0.5 as int) as TheOffset, CreationDate, UserID
        from tablename
)
select min(CreationDate), max(CreationDate), datediff(day, min(CreationDate), max(CreationDate)) + 1 as NumConsecutiveDays, count(*) as NumEntries, UserID
from numberedrows
group by UserID, TheOffset;

Bear in mind that I’m having to truncate the CreationDate here. If the values were stored as an date with no time component, I’d be able to use CreationDate without modification, which lends itself much better to indexing strategies. For this scenario, I’d prefer to have an index on a computed column which was the truncated date.

So there you go – a practical use for DENSE_RANK().

Annoying SharePoint error about features and templates

Following a successful installation of SharePoint (WSS) and making a new web application using a content database restored from another server, this annoying error appeared when on one of the sites.

Feature ‘75a0fea7-2d1e-451a-b445-16bc346d78e’ for list template ‘1’ is not installed in this farm. The operation could not be completed.

I say “annoying” because it seemed really hard to fix – hence this blog post.

By Binging for that GUID, I could see that it was related to the BugList template. That made sense, because the site in question is focussed on that. But I couldn’t see how to get the BugList template working. Being a feature (from the error message), I figured it would help to run SharePoint.exe (the setup program) again, but the only options in there were Repair and Remove, neither of which was looking like helping. Running the “SharePoint Products and Technologies Configuration Wizard” didn’t help either – there was nothing in there about features or templates.

A colleague pointed me to an application template download site, which helped, giving me a bunch of .stp and .wsp files, one of which was BugDatabase.wsp. Not BugList, not Bug Tracking, not Bug Tracker, but Bug Database. I should’ve taken a hint from here, but anyway.

A bit more Binging (and looking in the readme.txt that came with the download), and I found a page that told me how to install a .wsp file – I needed to use stsadm. But first I needed to install ApplicationTemplateCore.wsp. I wasn’t entirely convinced it was going to be the right solution, but I gave it a try.

“\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\BIN\stsadm.exe” –o addsolution –filename e:\AllTemplates\ApplicationTemplateCore.wsp

It didn’t work at first, but it turned out that I was copying the command line from a website, and it can’t’ve been recognising one of the spaces. I typed it in myself, and it worked first time.

There were more instructions on deploying the solution, but it turned out those instructions weren’t quite so useful, because when I went into the Central Administration site, in Operations, Solution Manager, I found that the deployment hadn’t been successful, despite stsadm telling me that it had been. So the next time I had to do this, I just added the solution with stsadm and then went straight to the Solution Manager, because I could deploy it from there.

In the Solution Manager, you click on the applicationtemplatecore.wsp entry and hit the “Deploy Solution” button. Hopefully it will work for you, as it did for me.

Terrific! Half way there.

Now I could do the same for BugDatabase.wsp.

“\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\BIN\stsadm.exe” –o addsolution –filename e:\AllTemplates\BugDatabase.wsp

..and go into Solution Manager to deploy it.

Now my site was up. But when I went into the Issues List, that was broken with a similar message, just a different GUID and template number.

Feature ‘60d1e34f-0eb3-4e56-9049-85daabfec68c’ for list template ‘1010’ is not installed in this farm. The operation could not be completed.

I could see this was a feature called pwsissues (thanks again Bing), but I couldn’t see a .wsp file that suggested anything. I hunted around for clues, even came across a discussion at West Michigan SharePoint User Group about it. It didn’t really help much though, I was still hunting around trying to figure out what I had to do. I felt it should be similar to my previous problem, so in desperation I started adding .wsp files indiscriminately.

And it worked! I should’ve tried this earlier, remembering that BugDatabase.wsp wasn’t called BugTracking.wsp. To get the Issues Tracking template in, I needed to add (and deploy) the solution called ProjectTrackingWorkspace.wsp.

Once I had done that, my errors disappeared, and I breathed a big sigh of relief.

I quite like SharePoint – it’s a very useful tool for many reasons. I guess I just haven’t had quite enough SharePoint _admin_ experience. But now I have a blog post to refer to if I come across this error again.

A really good book on BI

Business Intelligence is so much more than the technologies involved. Doing BI well is about delivering the right solution for the client, and being able to plan accordingly, about finding a match between the technologies and the business needs, and about being able to come up with a good design that incorporates not only the data warehouse, but also the OLAP database and client interfaces.

When I first picked up Smart Business Intelligence Solutions with Microsoft SQL Server 2008, I wasn’t sure what to expect. I didn’t know whether it would be design heavy, or technology heavy. I suspected it would be the latter, but without going deeply enough into the technologies.

But I was pleasantly surprised. It seems to be heavyweight enough in both design and technology. There are useful sections on every technical aspect of Business Intelligence, including Data Mining, Beginner and Advanced MDX, and more. And yet there are useful sections about BI principles – the kind of things that you come across in books by Kimball.

I feel like there are an increasing number of really good books about SQL Server, and this is definitely one.

Hey Microsoft – let me restrict my network connections better!

I’m sure you know the feeling… you jump online by tethering to your mobile because you want to check a particular website, and all of a sudden you find your connection is being flooded by a Windows Update, or Live Mesh download, or a Sync Framework thing. Perhaps even Outlook is downloading your Junk Mail folder for you.

But all you wanted was to check what the weather was – but it’s just cost you a fortune in download fees.

Situation is that running applications will check for an internet connection, and then take advantage of it – with no consideration about whether it’s going to cost you (per kilobyte!) to use that connection.

I would love to see a way of configuring my network connections (like how it asks me whether I’m at Home, Public or Work) into a profile, along the lines of:

  • Anything connect – all my apps can use this
  • Sometimes connect – apps need to ask me to use the connection before they do
  • Never connect – I would need to explicitly allow an application to use this connection

When I connect using an account that’s limited by size, I don’t want Silverlight to download a ton of images, or Windows Update to pull down a Service Pack. But I would be happy for a particular instance of a browser to connect out.

But when I’m at home, I’m happy for anything to be pulled down.

I’d probably end up using the “Sometimes connect” option a lot, so that my connection is more ‘available’, even when one of my applications decides to pull a ton of information down – but right now, connectivity is either ON or OFF, and I can’t easily stop applications from using it when it’s ON.

Please, Microsoft… let me choose this. It’ll help me feel okay about connecting while my applications are open (and not have to quit Skype/Msgr/LiveMesh before I connect).

Four speaking engagements coming up

I’m just going to list them…

This Saturday (July 18th), at Code Camp SA. I’m going to be talking about functions in SQL, particularly those that involve BEGIN and END.

At the end of the month, at the ACS Branch Conference. I’m going to be part of a panel discussing Open Source v Closed Source.

In August, I’m going to be speaking at SharePoint Saturday (Adelaide), about the integration of Reporting Services and SharePoint.

In September, I’m going to be a presenting at TechEd Australia, about SQL Azure.

Be nice if there was more of an overlap in topics…

DeepZoom fixoutlook – and a tip on hosting SilverLight

I haven’t done much SilverLight development – I’ve dabbled at best. And I don’t really count using DeepZoom Composer as proper SilverLight, but it makes a SilverLight application.

So I’ve got one at http://www.lobsterpot.com.au/fixoutlook

The background is that there’s a group of people who feel very strongly that Outlook should use proper standards for rendering emails. I understand that Microsoft have made a conscious decision to use Word for rendering emails, but I’m certainly part of the crowd who agree that Microsoft may have got this one wrong. They made a large image, and I thought this image would made a great candidate for DeepZoom. I dropped Dave a line, and he gave me the go-ahead.

So have a look around my DeepZoom application, and in particular, zoom in at the top of the “L’’ in the word outlook, to find how much you can zoom in. Look for the white lines, and you’ll see some more zooming opportunity.

And a big thanks to John O’Brien, for reminding me that when you host a SilverLight application, you need make sure the mime types are set appropriately.

The new brand is Microsoft SQL Azure

It used to be SQL Server Data Services. Then SQL Data Services. Then we started hearing about SQL Services, and now (I hope finally), we have Microsoft SQL Azure instead of SQL Services, and the Microsoft SQL Azure Database instead of SQL Data Services.

http://blogs.technet.com/dataplatforminsider/archive/2009/07/08/microsoft-sql-services-is-now-microsoft-sql-azure.aspx

I think this is a sensible move. For a while now, whenever refer to SQL Services, I have to make it clear that it’s not the Windows Services that run SQL on your server, but that it’s the “in the cloud” offering that Microsoft are bringing out. Now, I can talk about SQL Azure, and it’s very clear what it means. And SQL Azure Database is clear too, as opposed to whatever will come, like (and I’m predicting here, haven’t heard anything) SQL Azure Analytics or SQL Azure Reporting.

This is definitely a positive move, and should hopefully help people realise that this is not the same as SQL Server, help them approach their SQL Azure platform slightly differently to their On-Premise equivalent.

For example… At the moment, the SQL Azure Database platform is limited in size. I hope it will soon be unlimited, but for the time being there is a limit which will affect development decisions. If your database suddenly needs to store a lot more data than you expected, you may need to hurriedly work out how you’re going to handle this. But I think having a name which reflects the Azure nature will prompt developers into making this decision earlier.

I’m looking forward to getting my hands on SQL Azure, and seeing how my job will change because of it.

Excel dates counted differently and a reliable way of working out the day of the week

Following my recent post about 40,000 days, I got a couple of emails telling me that Excel disagrees about when the 40,000th day is. And this is true – Excel counts Day 40000 as July 6th 2009, not July 7th.

Unfortunately for Excel users, they’re wrong. And it’s down to the fact that Excel thinks that 1900 was a leap-year, when we all know it wasn’t. You see, leap-years only fall on the century year if the year is divisible by 400. So 2000 was a leap-year, but 1900 wasn’t, and 2100 won’t be either.

What’s more, this has been a problem for ages, and it was a conscious decision to introduce the bug. Essentially, Lotus 1-2-3 thought 1900 was a leap-year, and to be consistent with the market leader of the day, Microsoft treated 1900 as a leap-year too. And nowadays, we have a situation where backwards compatibility is more important. Today, if you have an application which knows that day 40000 is July 7th, then you’d better not treat your date that same way in Excel (or vice-versa).

The thing that I find really amazing with this is that Excel tells me that Feb 29, 1900 was a Wednesday. But it didn’t exist – so what happened that week? Actually (and you can check other calendars for this, including Windows’ one), Feb 28th was a Wednesday, and Excel gets the day of the week wrong for the first 59 days of its calendar.

I know you don’t care, but perhaps you should – in case you ever write an application that needs to know what day of the week it is.

Working out the day of the week is really trivial. For instance, in SQL Server, you can generally ask for the DATEPART(dw,…) of the date in question, and get a number back, telling you what day of the week it is. It’ll tell you 1 for the 1st day of the week, 2 for the 2nd, and so on.

Which is great, until you find that someone in your organisation says that Sunday is the first of the week, but someone else insists that it’s Monday. In the movie industry, I think Thursday is the first day of the week. So then, when is the 5th day? In SQL we have @@DATEFIRST, which helps a lot, but a method I like to use is to count the number of days since a known Sunday (or whatever), and take the “mod 7”. If that’s zero, I’m an exact number of weeks since that known Sunday. It works nicely, and it’s simple enough for everyone to understand (and it works regardless of location or other changeable settings).

But if you had picked your “known Sunday” in early 1900 using Excel, you’d’ve got it wrong, and your data might not work if you push your system out to SQL later (so pick something later – like 1901). I recently dealt with a date dimension that someone had put together in Excel and imported into SQL – if this data had gone back to 1900, then there would’ve certainly been errors in it (for a start, the import wouldn’t’ve worked because SQL would’ve complained that Feb 29, 1900 wasn’t a valid date).

My preference with date dimensions is to use a lot of computed columns, and only ever populate a single field. It works nicely, and it’s almost no effort to extend the table to include extra dates when required.