A few changes…

It’s been an interesting month. Today I’ve just come home from hospital having had an inflamed appendix removed. I’m not talking about the back of the book that I set on fire – rather it’s was the reason for a large amount of gut-ache on Monday. I went to the doctor on Tuesday morning, and he sent me to the hospital, where they found the earliest opportunity to remove it (which was Wednesday – at least it hadn’t burst yet). They did this by going through my belly-button, blowing a bunch of air in to inflate my abdomen, and then using a camera and telescopic tools to remove it. They made two other holes to help, but the keyhole approach meant I could come home today (Thursday). I’m still feeling rather lousy, and having trouble concentrating, but I’m essentially fine, and will be fit to catch flights again by the weekend. Annoyingly, I didn’t get to keep the appendix (in a jar, to give to trick-or-treaters tomorrow night), because it’s been sent off for tests. They’ll let me know if the inflammation was caused by anything more sinister. Chances are minimal though, and I’m sure God will make sure everything’s fine.

It still ruined my week though – I had things I was planning to get done.

So that’s one change…

lp_smallThe other major one is that my employment situation has changed. I’m still very much a mentor at SQLskills Australia (previously known as Solid Quality Learning Pty Ltd), but I’m no longer an employee there. I have set up my own company, called LobsterPot Solutions, and will be operating through that. I will continue to teach through SQLskills, and do not expect to be in competition with them in any way. I hope that I can help them achieve as much success under this arrangement as I could as an employee.

Through LobsterPot Solutions I will be available for consulting and mentoring assignments, primarily in the SQL Server space – but will probably expand the range of offerings to include web (including Silverlight), SharePoint, Dynamics CRM, and others – areas that I’ve worked in over the years and wouldn’t mind getting back to. In time I will bring on staff, so that I can broaden my capabilities, and you’ll probably see my kids doing work experience in a few years.

I’m sure SQL Server will always be where I spend most of time, and continue to maintain my expertise. I don’t plan let my interest in other areas dampen my SQL passion at all, but rather to enhance that ‘usefulness of data’ aspect which I have always enjoyed.

As for those things I was planning to get done this week… I need to finish off the website, finish organising business cards, chase down some charities, write up some more of my T-SQL Tips, work on some book chapters, and a few other things besides. But I guess a hospital stay is a reasonable excuse.

Missing Index in SQL Server 2008 – should try harder!

Ok, maybe I’m being a little harsh, but I just feel like it should be better.

Let me show you the niceness of the way that missing indexes are handled in SQL Server 2008.

Using AdventureWorks (not AdventureWorks2008) on a SQL Server 2008 install, if I show the Execution Plan from this simple query, I get a nice suggestion. My query…

select productid, orderqty
from sales.salesorderdetail
where carriertrackingnumber = ‘FB88-4B92-82′;

…could be improved through better indexing. It uses 1240 reads to get this data, which seems awful. The system shows me that it could be improved, and suggests an index.

image

It’s there, in green. It says:

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>] ON [Sales].[SalesOrderDetail] ([CarrierTrackingNumber])

(that’s right, no semi-colon on the end, but I’m fine about that)

What I’m not fine with is the fact that this index isn’t actually so ideal. If I create it (supplying a name, of course), we see it’s being used, and it should be clear that a better index ought to be better.

image

This query uses 42 reads to get the required information, which is significantly better than 1740, but still not brilliant. In fact, 42 is about 2.4% of 1740, so it’s hardly the 99.6512% improvement that Management Studio suggested would be seen.

My preference would be to consider that Key Lookup as well. it’s taking 92% of this improved query. We can avoid the Key Lookup by creating an index which INCLUDEs the columns we’re interested in. Like this:

CREATE NONCLUSTERED INDEX [MyNewIndex2] ON [Sales].[SalesOrderDetail] ([CarrierTrackingNumber]) INCLUDE (ProductID, OrderQty);

If I create this index, we see that the execution plan becomes just the Index Seek (on my new index), and the number of reads drops to just 3. Yes 3. That’s 0.17% of the original query, and only 7.1% of the reads of the improved query! 99.83% of the original reads have been eliminated – much more like the figures promised by my Missing Index suggestion, except it got it wrong.

image

I like the idea of detecting Missing Indexes, and I love the fact that it suggests these in Execution Plan viewer… I just want it to be slightly better by considering INCLUDEd columns.

I’ve suggested this be improved on the Connect site at https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=375024

Doing the maths to understand SQL optimiser choices

The second of my tips from my talk (slides and scripts available) was about doing the maths to understand why the SQL Server optimiser behaves in the way that it does – particularly in relation to choosing whether to scan an index/heap, or whether to seek a less-suitable index for some of the data and then do a lookup to fetch the rest. This applies in SQL Server 2005 just fine, by the way, as do all the scripts from my talk.

The basic gist of it is that the system will decide to use a plan which minimises the number of reads required to get all the data. So My example scripts compare a scenario of looking up employees in AdventureWorks.HumanResources.Employee by managerid. There’s a Non-Clustered Index on ManagerID (which naturally then includes EmployeeID, which is the Clustered Index). But if we also want the LoginID field, then our NCIX isn’t quite so handy.

The choice of how to implement this comes down to basic mathematics. First, a couple of useful facts:

  • The whole Employee table (CIX in the physical representation) is on 9 pages.
  • Seeks on small indexes typically take 2 reads – one to find out which page the row is on, and the other to get that page.
  • Lookups are essentially the same kind of thing as seeks.

So let’s consider the two options here – scanning the table v seeking the index and looking up the rest.

Scanning a table is always going to be 9 reads. Easy.

Seeking the table is going to take a couple of reads (but maybe three even) to find the EmployeeIDs for the people that report to the manager of interest. But then it’s going to require another two reads for every employee in this list, to be able to get the LoginID field out of the CIX. So it’s going to be 2+2n reads (where n is the number of employees).

So if our manager has 2 reports, we get 2+2×2=6 reads using the seek, and the system will use this method.

But if our manager has 4 reports, we get 2+2×4=10 reads using the seek, which would make it better to use just scan the CIX instead (only 9 reads).

Naturally, if we change our index to include the LoginID, the query then takes 2 reads, whether the manager has 2 or 4 direct reports. It could take more if those records are stored over multiple pages.

Understanding this simple bit of maths can really impact your indexing strategy, and your appreciation of the way that T-SQL works.

Presenting at ADNUG this week

I haven’t been to ADNUG for a while. I love that the group is there, but over the past year or more, I haven’t prioritised getting to the meetings. I’ve been to meetings of equivalent groups in Melbourne and Sydney, but not to the Adelaide .Net Group for a while.

But this week I will! I’ve offered to be a stand-in presenter, and will present a few tips around T-SQL. I’m presenting a similar talk in Wagga this coming weekend, so it’ll give me a chance to work out my talk in advance. I regularly present and teach T-SQL things, so I’ll be picking a few of the more useful suggestions (particularly those that get a good response from the audience), and showing some of them. People who at one of my recent presentations may have heard bits before, but definitely not all of them.

Congratulations, Mitch

A good friend of mine from Perth got awarded MVP status this past week. Mitch Wheat runs the .Net User Group over there, and does a terrific job. He does a lot for the community, and is incredibly smart. I’m really pleased for him.

PS: There are a bunch of other new MVPs this October as well (plus I got rewarded) – I’m just mentioning Mitch because he’s a good friend and I’m so pleased for him!

OUTPUT clause – knowing what goes in, and what you’ve accidentally taken out

The OUTPUT clause has to be one of the best T-SQL features out there. It was new in SQL Server 2005, but it’s still remarkably little known. I guess like many of the features that were introduced in SQL 2005, many people just make do with the way they did things before.

The basic point of the OUTPUT clause is to turn a data modification statement into one that returns data (although this data can be redirected to an existing table by simply adding “INTO tablename” after it), by providing a way of accessing the inserted and deleted tables that we have been using in triggers for years. If a DELETE statement is being issued, only the deleted table is available, and similarly inserted for an INSERT statement. UPDATE provides both – and the MERGE statement (new in SQL Server 2008) populates both (although it feels as if a FULL JOIN has been applied to them).

I’m sure it’s easy to imagine the tremendous advantage to using it with UPDATE statements – auditing. It’s very easy to push the changes to an audit table, without the need to create a trigger to do it for us. But I find that the biggest advantage is for those times when carelessness has got the better of us. Hopefully these times are rare (and the more experienced we get with databases the more we seem to respect the sanctity of the data), but if you always use “OUTPUT deleted.*” when deleting, or “OUTPUT deleted.*, inserted.*”, then the odd time when you see more rows come back that you expected, you can have a plan associated with your “Oops” moment.

If you don’t use the OUTPUT clause, you get a message that tells you how many rows were affected by your query. If you expect that number to be small, and it’s actually quite large, you’ve probably done something wrong. “Oops” is probably an understatement. You may have started a transaction and be able to roll it back, but until such time as you get to that, you have locks which aren’t being released. Regardless of whether or not you can roll it back, having something which shows you what you’ve just done can really help you out. You can copy the data presented into Excel, or Notepad, or whatever, and work out your problem. If you can’t roll it back, then this may involve some sort of import process being quickly thrown together.

The benefit is far less obvious when inserting data – but in some ways, it’s actually even more useful.

We’ve had the @@IDENTITY and SCOPE_IDENTITY() functions available for some time, and they’re widely used. But if multiple records are inserted, or if the targetted table doesn’t contain an identity field, then they’re actually not quite so great. Knowing which record is which is a question of re-querying the data and hoping you can tell. But if you “OUTPUT inserted.*”, your application can immediately tell which row was inserted with which surrogate key (it’s less of an issue if you use a natural primary key of course). With a large number of systems opting to use guids for PKs, defaulting to a value of newid(), it really helps to have an OUTPUT clause so that the guid doesn’t have to be generating prior to performing the insert.

The biggest caveat with the OUTPUT clause is that it can be ruined by triggers. A trigger being called can prevent the inserted and deleted tables from being available at the end of the statement. I’d like to have a way of stating that the OUTPUT clause should still work, returning the state of the inserted and deleted tables prior to any triggers being called, but I suppose I can understand the logic behind the decision to disallow it.

This was one of the tips in my TechEd Australia presentation, which was re-delivered to user groups in Adelaide and Melbourne. The scripts and slides are available for download. This was one of the more popular tips, based on feedback.