Category Archives: 7373

SARGable functions in SQL Server

SARGable is an adjective in SQL that means that an item can be found using an index (assuming one exists). Understanding SARGability can really impact your ability to have well-performing queries. Incidentally – SARGable is short for Search ARGument Able.

If you have an index on phone numbers using LastName, followed by FirstName, including the suburb and address fields, you have something akin to the phone book. Obviously it becomes very easy to find people with the surname “Farley”, with the first name “Rob”, but often you want to search for people with the surname “Farley” with the first name beginning in ‘R’. I might be listed as “R Farley”, “R J Farley”, “Rob Farley”, “Robert Farley”, “Robert J. Farley”, or a few other variations. It complicates things even more if you need to find someone with a name that shortens a different way, like John/Jack, or Elizabeth/Betty. This is where SARGability comes into play.

Let’s just think about the First names for a minute.

If you want to find all the names that start with R, that’s easy. They’re all together and you can get to them very quickly. This is comparable to a query in SQL Server like this, (taking advantage of the index on the Name column in Production.Product)

select Name, ProductID
from Production.Product
where Name like ‘R%’ ;

Looking in the Execution Plan, we see an Index Seek to find the 52 rows, and the seek has a Seek Predicate like this (by looking in either the ToolTip of the operator, the Properties window, or the XML itself):

Seek Keys[1]: Start: [AdventureWorks].[Production].[Product].Name >= Scalar Operator(N’R’), End: [AdventureWorks].[Production].[Product].Name < Scalar Operator(N’S’)

This shows that the system looks as the LIKE call, and translates it into a greater-than and less-than query. (Interestingly, have a look at the End Seek Key if you tell it to find entries that start with Z)

So the LIKE operator seems to maintain SARGability.

If we want to consider Names that have R for the first letter, this is essentially the same question. Query-wise, it’s:

select Name, ProductID
from Production.Product
where LEFT(Name,1) = ‘R’ ;

Unfortunately the LEFT function kills the SARGability. The Execution Plan for this query shows an Index Scan (starting on page one and going to the end), with the Predicate (not, not Seek Predicate, just Predicate) “substring([AdventureWorks].[Production].[Product].[Name],(1),(1))=N’R’”. This is bad.

You see, a Predicate is checked for every row, whereas a Seek Predicate is used to seek through the index to find the rows of interest. If an Index Seek operator has both a Predicate and a Seek Predicate, then the Predicate is acting as an additional filter on the rows that the Seek (using the Seek Predicate) has returned. You can see this by using LIKE ‘R%r’

Considering the first part of a string doesn’t change the order. SQL knows this because of the way it handles LIKE (if the left of the string is known), but it doesn’t seem to get this if LEFT is used. It also doesn’t get it if you manipulate a field in other ways that we understand don’t affect the order.

select ProductID
from Production.Product
where ProductID + 1 = 901;

This is doing a scan, checking every row, even though we can easily understand what we mean. The same would apply for this query (assuming there’s an index on OrderDate):

select OrderDate
from Sales.SalesOrderHeader
where dateadd(day,1,OrderDate) = ‘20040101’
;

And perhaps most significantly:

select OrderDate
from Sales.SalesOrderHeader
where dateadd(day,datediff(day,0,OrderDate),0) = ‘20040101’
;

…which is largely recognised as being an effective method for date truncation (and why you should always compare dates using >= and < instead)

But more interestingly…

…this query is just fine. Perfectly SARGable.

select OrderDate
from Sales.SalesOrderHeader
where cast(OrderDate as date) = ‘20040101’
;

This query does a little work to figure out a couple constants (presumably one of them being the date 20040101, and another being 20040102), and then does an Index Seek to get the data.

You see, the date and datetime fields are known to have a special relationship. The date type is essentially the left-most three bytes of a datetime type, and therefore the ordering is identical.

It doesn’t work if you want to do something like:

select OrderDate
from Sales.SalesOrderHeader
where convert(char(8), OrderDate, 112) = ‘20040101’
;

…but did you really think it would? There’s no relationship between strings and dates.

I wish it did though. I wish the SQL team would go through every function and think about how they work. I understand that CONVERT will often change the order, but convert using style 112 won’t.

Also, putting a constant string on the end of a constant-length string shouldn’t change the order. So really, this should be able to work:

select OrderDate
from Sales.SalesOrderHeader
where convert(char(6), OrderDate, 112) + ’01’ = ‘20040101’
;

But it doesn’t.

Interestingly (and a prompt for this post), the hierarchyid type isn’t too bad. It understands that some functions, such as getting the Ancestor won’t change the order, and it keeps it SARGable. Here the asker had noticed that GetAncestor and IsDescendantOf are functions that don’t kill the SARGability – basically because the left-most bits of a hierarchyid are the parent nodes.

http://stackoverflow.com/questions/2042826/how-does-an-index-work-on-a-sql-user-defined-type-udt

Spatial types can show similar behaviour.

So I get the feeling that one day we might see the SQL Server team implement some changes with the optimizer, so that it can handle a lot more functions in a SARGable way. Imagine how much code would run so much better if order-preserving functions were more widely recognised. Suddenly, large amounts of code that wasn’t written with SARGability in mind would start running quicker, and we’d all be hailing the new version of SQL Server.

I’ve raised a Connect item about this, at https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=526431

You may have code that would run thousands of times faster with this change. That code may live in third party applications over which you have no control at all. If you think there’s a chance you fall into that bracket, why not go and vote this up?

High ROI items for SQL Server 2008

To persuade your boss to embrace an upgrade to SQL 2008, you need to know which features have high Return On Investment. They may have seen presentations talking about features like Spatial, or MERGE (and been quite impressed), but they may well have left those presentations thinking about the effort that’s would be involved in rewriting applications to take advantage of these features. It’s all well and good to see your customers on a map, but someone has to make that spatial data appear somewhere.

This post is a callout for features that will benefit you (and your boss) as soon as you do the upgrade (or soon after). And I welcome comments to list other items as well.

  • Block Computation (in SSAS – which reduces the effort in processing significantly, for no change in the application )
  • Transparent Data Encryption (in the Database Engine – which makes sure that data at rest is encrypted, with no change in the application)
  • Backup Compression (which reduces the size of backups, and can be set as the default so that existing backup scripts don’t need to change)
  • Data Compression (minimal change to turn on compression on tables which will compress nicely)
  • Filtered Indexes (because how far off is your next index creation, really?)
  • Auditing & Change Data Tracking (because it’s very easy to turn on and then review the data as you need it)
  • Export to Word in SSRS (because everyone’s wanted this for so long)
  • SSRS paging (because SSRS used to get _all_ the data for a report before rendering it – but not in 2008)
  • Resource Governor (easy to set up, nice to have in place for when you might want it)
  • Hot-add memory (so that you can just plug in more memory without having to do restarts)

I’m not suggesting that an upgrade should be done flippantly. You should still consider the effort of thoroughly testing your system under SQL 2008. But hopefully this list can highlight some of the things that I’ve found are good persuaders. A list of “What’s New in SQL 2008” can be found at http://www.microsoft.com/sqlserver/2008/en/us/whats-new.aspx

Like I said, you may have other items on your own list, and I invite you to comment on this. You may also have things in place to handle things like encryption, and you may be running Hyperbac or one of the other compression tools.

Missing Index in SQL Server 2008 – should try harder!

Ok, maybe I’m being a little harsh, but I just feel like it should be better.

Let me show you the niceness of the way that missing indexes are handled in SQL Server 2008.

Using AdventureWorks (not AdventureWorks2008) on a SQL Server 2008 install, if I show the Execution Plan from this simple query, I get a nice suggestion. My query…

select productid, orderqty
from sales.salesorderdetail
where carriertrackingnumber = ‘FB88-4B92-82’;

…could be improved through better indexing. It uses 1240 reads to get this data, which seems awful. The system shows me that it could be improved, and suggests an index.

image

It’s there, in green. It says:

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>] ON [Sales].[SalesOrderDetail] ([CarrierTrackingNumber])

(that’s right, no semi-colon on the end, but I’m fine about that)

What I’m not fine with is the fact that this index isn’t actually so ideal. If I create it (supplying a name, of course), we see it’s being used, and it should be clear that a better index ought to be better.

image

This query uses 42 reads to get the required information, which is significantly better than 1740, but still not brilliant. In fact, 42 is about 2.4% of 1740, so it’s hardly the 99.6512% improvement that Management Studio suggested would be seen.

My preference would be to consider that Key Lookup as well. it’s taking 92% of this improved query. We can avoid the Key Lookup by creating an index which INCLUDEs the columns we’re interested in. Like this:

CREATE NONCLUSTERED INDEX [MyNewIndex2] ON [Sales].[SalesOrderDetail] ([CarrierTrackingNumber]) INCLUDE (ProductID, OrderQty);

If I create this index, we see that the execution plan becomes just the Index Seek (on my new index), and the number of reads drops to just 3. Yes 3. That’s 0.17% of the original query, and only 7.1% of the reads of the improved query! 99.83% of the original reads have been eliminated – much more like the figures promised by my Missing Index suggestion, except it got it wrong.

image

I like the idea of detecting Missing Indexes, and I love the fact that it suggests these in Execution Plan viewer… I just want it to be slightly better by considering INCLUDEd columns.

I’ve suggested this be improved on the Connect site at https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=375024

Ctrl+1 in SSMS for sp_who, plus more

I did this by accident, but turns out it’s a really useful feature. I was just trying to zoom in on something using Zoomit, that really useful thing that I use whenever I’m presenting these days. But I didn’t have Zoomit running, and so SQL Server Management Studio ran sp_who.

And it’s not even new. This is an old Query Analyzer thing – I just hadn’t come across it before. And it’s extensible! Go to Tools, Options, Environment, Keyboard. By default, sp_who is Ctrl+1, sp_lock is Ctrl+2 and sp_help is Alt+F1 (which runs on whatever you have highlighted). Then pick one of the ones that you’re allowed to set for yourself, and put a command in. I can’t believe I haven’t stumbled across it before. I’ve already put sp_helpindex on Ctrl+3, and sp_helptext on Ctrl+4.

The only complaint I have about it is that if you select a two-part object name and hit Alt+F1 (the shortcut for sp_help), it fails because there aren’t quotes around the name. I’d like to be able to wrap stuff up, to make it do “sp_help ‘*'”, where the star refers to the selected text. I thought this would be worthy of posting to Connect, but Michael Swart has already posted something similar. No-one’s voted on this yet, but I think it could be really useful (so please, go vote. Put comments in, all kinds of stuff). Great to be able to call sp_helpindex when highlighting a table (I’ve just added this one), but if this breaks whenever I need to specify a schema, then it’s just a little less useful. I’d even like to be able to have something which runs a whole query, using my highlighted text somewhere in there.

At the moment I’m playing around with having a keyboard shortcut for:

select * from sys.dm_db_index_usage_stats

and then highlighting:

where object_id = object_id(‘sales.salesorderheader’)

What do you have on your keyboard shortcuts?

Design Query in Editor bug

Ok, so real database developers don’t use the graphical “Design Query in Editor”… yeah, I know. Sure, there’s the odd time when you’re typing a query and you don’t have an Object Explorer (eg, in SSIS) and a moment of weakness sees you hit “Build Query” to save some typing, but in general I encourage people to write their queries in Management Studio SSMS and then copy them into the SSIS dialog. I was showing someone some of the frustrations I have with the graphical editor, and came across a real beauty – repeated predicates.

I logged a bug at the Connect site, where I wrote the rest of the details. You should vote for this – there may be a time when you click “Build Query” to avoid some typing, and you don’t want to look like an idiot for repeating the same line multiple times.

https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=352874

Fuzzy in T-SQL

SQL Server gives you Fuzzy Lookups and Fuzzy Grouping, but only in SQL Server Integration Services. It’s not even on the list of SSIS enhancements for SQL Server 2008.

This week at the MVP Summit, I was having a discussion about this with Jamie Thomson, and we had a chat with one of the T-SQL guys to suggest it. The response came back with “Log in on connect”, which I did just now. https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=338664

Please check it out and provide comments, votes, validations. All this will help persuade Microsoft to implement this useful feature. And tell your friends too! The more it receives, the more likely it will be implemented.