SARGable functions in SQL Server

SARGable is an adjective in SQL that means that an item can be found using an index (assuming one exists). Understanding SARGability can really impact your ability to have well-performing queries. Incidentally – SARGable is short for Search ARGument Able.


If you have an index on phone numbers using LastName, followed by FirstName, including the suburb and address fields, you have something akin to the phone book. Obviously it becomes very easy to find people with the surname “Farley”, with the first name “Rob”, but often you want to search for people with the surname “Farley” with the first name beginning in ‘R’. I might be listed as “R Farley”, “R J Farley”, “Rob Farley”, “Robert Farley”, “Robert J. Farley”, or a few other variations. It complicates things even more if you need to find someone with a name that shortens a different way, like John/Jack, or Elizabeth/Betty. This is where SARGability comes into play.


Let’s just think about the First names for a minute.


If you want to find all the names that start with R, that’s easy. They’re all together and you can get to them very quickly. This is comparable to a query in SQL Server like this, (taking advantage of the index on the Name column in Production.Product)


select Name, ProductID
from Production.Product
where Name like ‘R%’ ;


Looking in the Execution Plan, we see an Index Seek to find the 52 rows, and the seek has a Seek Predicate like this (by looking in either the ToolTip of the operator, the Properties window, or the XML itself):


Seek Keys[1]: Start: [AdventureWorks].[Production].[Product].Name >= Scalar Operator(N’R’), End: [AdventureWorks].[Production].[Product].Name < Scalar Operator(N’S’)


This shows that the system looks as the LIKE call, and translates it into a greater-than and less-than query. (Interestingly, have a look at the End Seek Key if you tell it to find entries that start with Z)


So the LIKE operator seems to maintain SARGability.


If we want to consider Names that have R for the first letter, this is essentially the same question. Query-wise, it’s:


select Name, ProductID
from Production.Product
where LEFT(Name,1) = ‘R’ ;


Unfortunately the LEFT function kills the SARGability. The Execution Plan for this query shows an Index Scan (starting on page one and going to the end), with the Predicate (not, not Seek Predicate, just Predicate) “substring([AdventureWorks].[Production].[Product].[Name],(1),(1))=N’R’”. This is bad.


You see, a Predicate is checked for every row, whereas a Seek Predicate is used to seek through the index to find the rows of interest. If an Index Seek operator has both a Predicate and a Seek Predicate, then the Predicate is acting as an additional filter on the rows that the Seek (using the Seek Predicate) has returned. You can see this by using LIKE ‘R%r’


Considering the first part of a string doesn’t change the order. SQL knows this because of the way it handles LIKE (if the left of the string is known), but it doesn’t seem to get this if LEFT is used. It also doesn’t get it if you manipulate a field in other ways that we understand don’t affect the order.


select ProductID
from Production.Product
where ProductID + 1 = 901;


This is doing a scan, checking every row, even though we can easily understand what we mean. The same would apply for this query (assuming there’s an index on OrderDate):


select OrderDate
from Sales.SalesOrderHeader
where dateadd(day,1,OrderDate) = ‘20040101’
;


And perhaps most significantly:


select OrderDate
from Sales.SalesOrderHeader
where dateadd(day,datediff(day,0,OrderDate),0) = ‘20040101’
;


…which is largely recognised as being an effective method for date truncation (and why you should always compare dates using >= and < instead)


But more interestingly…


…this query is just fine. Perfectly SARGable.


select OrderDate
from Sales.SalesOrderHeader
where cast(OrderDate as date) = ‘20040101’
;


This query does a little work to figure out a couple constants (presumably one of them being the date 20040101, and another being 20040102), and then does an Index Seek to get the data.


You see, the date and datetime fields are known to have a special relationship. The date type is essentially the left-most three bytes of a datetime type, and therefore the ordering is identical.


It doesn’t work if you want to do something like:


select OrderDate
from Sales.SalesOrderHeader
where convert(char(8), OrderDate, 112) = ‘20040101’
;


…but did you really think it would? There’s no relationship between strings and dates.


I wish it did though. I wish the SQL team would go through every function and think about how they work. I understand that CONVERT will often change the order, but convert using style 112 won’t.


Also, putting a constant string on the end of a constant-length string shouldn’t change the order. So really, this should be able to work:


select OrderDate
from Sales.SalesOrderHeader
where convert(char(6), OrderDate, 112) + ’01’ = ‘20040101’
;


But it doesn’t.


Interestingly (and a prompt for this post), the hierarchyid type isn’t too bad. It understands that some functions, such as getting the Ancestor won’t change the order, and it keeps it SARGable. Here the asker had noticed that GetAncestor and IsDescendantOf are functions that don’t kill the SARGability – basically because the left-most bits of a hierarchyid are the parent nodes.


http://stackoverflow.com/questions/2042826/how-does-an-index-work-on-a-sql-user-defined-type-udt


Spatial types can show similar behaviour.


So I get the feeling that one day we might see the SQL Server team implement some changes with the optimizer, so that it can handle a lot more functions in a SARGable way. Imagine how much code would run so much better if order-preserving functions were more widely recognised. Suddenly, large amounts of code that wasn’t written with SARGability in mind would start running quicker, and we’d all be hailing the new version of SQL Server.


I’ve raised a Connect item about this, at https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=526431


You may have code that would run thousands of times faster with this change. That code may live in third party applications over which you have no control at all. If you think there’s a chance you fall into that bracket, why not go and vote this up?

T-SQL Tuesday – HAVING Puzzle answer

Earlier today you may have seen a blog post of mine about a puzzle involving HAVING. You should read that post before this one. It was part of Adam Machanic’s T-SQL Tuesday meme.

The question was about the query:

SELECT ‘No Rows’
WHERE 1=2
HAVING 1=1;

And here’s the explanation.

Start by making yourself a "dual table", like what you’d use in Oracle, and use this instead of having no FROM clause. Put a row in it.

CREATE TABLE dual (dummy bit);
INSERT dual VALUES (1);

–Now count the rows in it
SELECT COUNT(*)
FROM dual;

–Now count how many rows don’t match 1=2 (of course, the answer is zero)
SELECT COUNT(*)
FROM dual
WHERE 1=2;

–Naturally we’d get nothing back if we weren’t grouping
SELECT ‘Something’
FROM dual
WHERE 1=2;

–But HAVING forces the grouping functionality as well (like using COUNT(*))
SELECT ‘Something’
FROM dual
WHERE 1=2
HAVING 1=1;

–So in this query, we couldn’t put any of our real columns in, only aggregate functions and constants
SELECT *
–Errors
FROM dual
WHERE 1=2
HAVING 1=1;

–And leaving out the FROM clause implies that we’re asking all this of a secret internal table with a single row. All these queries work just the same without the FROM clause at all.

–Count the rows in our pretend table (one)
SELECT COUNT(*)

–Now count how many rows don’t match 1=2 (zero)
SELECT COUNT(*)
WHERE 1=2;

–Naturally we’d get nothing back if we weren’t grouping
SELECT ‘Something’
WHERE 1=2;

–But HAVING forces the grouping functionality as well
SELECT ‘Something’
WHERE 1=2
HAVING 1=1;

So the answer to the question posed is that you get a single row, containing the text provided. The fact that I used the text ‘No Rows’ was just a bit of fun.

Now, to remove the trivia a little…

When would you ever use HAVING without GROUP BY in a practical situation?

How about this:

Using sp_MSforeachdb, find the number of objects in non-system databases. It’s an undocumented system stored procedure which runs a query on each database, replacing a question mark in the query with the name of the database. It can be quite handy, just don’t look at how it’s implemented.

EXEC sp_MSforeachdb ‘SELECT ”?”, COUNT(*) FROM ?.sys.objects WHERE ”?” NOT IN (”master”,”tempdb”,”model”,”msdb”);';

But this won’t do it. It will still return the entries for the system databases, but with zeroes (because none of the objects satisfied the WHERE clause). Replace WHERE with HAVING and it’s just fine – the rows get eliminated from the resultset.

EXEC sp_MSforeachdb ‘SELECT ”?”, COUNT(*) FROM ?.sys.objects HAVING ”?” NOT IN (”master”,”tempdb”,”model”,”msdb”);';

Honestly, HAVING doesn’t require a GROUP BY clause. It doesn’t require anything. It filters based on groups, and if there are no groups yet, it makes some – like how using an aggregate will count the rows in an empty set and return one row representing that group.

It’s generally taught as "HAVING is for filtering based on aggregates", and that’s true, but only half the story. And I find that if I’m teaching people to write better queries, I want them to have a thorough understanding of what each construct is really doing.

T-SQL Tuesday – T-SQL Puzzle with HAVING

Adam’s hosting another T-SQL Tuesday, for which this post is jumping in. He’s themed it around T-SQL Puzzles, which I found quite interesting, because the world is full of them.

Most of the questions that I answer on forums, help sites, and so on, are puzzles. I guess there’s the difference between “Problem” and “Puzzle”, but I prefer to think of thing as puzzles.

For Adam’s meme though, I thought I’d share a Puzzle that I ask students who take my Advanced T-SQL course. The idea is to have them start thinking about what each component of T-SQL is actually doing, so that they can better address problems they face. If you have a rifle, it’s nice to actually know what the various components of it are for, so that you can use it more effectively.

I actually ask them a large number of things, but the one that I thought I’d pose for you all today is about the results of this. The answer will be in the next blog post, which hopefully you haven’t read yet. I will have them both published on Tuesday 12th, this one at the start of the day, and the answer towards the end of the day.

The question is simply a query. Can you predict the output, and explain why? Feel free to comment to your heart’s content, as I will moderate them and only publish them afterwards. In fact, I’ll probably take a few days to get to them (being holiday period), so I apologise if you’re wanting to read what other people thought too.

Naturally, you can check your answer by actually running the query, but please provide your thoughts before you do. The query is below. There is no FROM clause. There is no GROUP BY clause. Does it error, do you get an empty resultset, do you get a single row containing NULL, do you get a single row with data, do you get multiple rows, or something else I haven’t suggested? Enjoy.

SELECT ‘No rows’
WHERE 1=2
HAVING 1=1;

Foreign Keys against Views

Foreign Keys are great. For those of you who read some of my other posts, or heard me present on this material before, a foreign key can really help the Query Optimizer simplify out your query, because it knows that it must be able to find a match in an equality join (unless the field allows NULLs, which is a whole nother story).

I also blogged recently about the fact that a foreign key doesn’t have to reference the primary key of a table. Turns out it can reference any candidate key, that is, one that has a unique index on it. This presents all kinds of interesting design ideas, such as avoiding joins by storing a different field (such as the username, rather than userid, if the username is the one thing you keep looking up to display at the top of the webpage). Referential integrity can still be maintained happily using an index seek (just a slightly larger index, but quick nonetheless), but the benefit could be huge if many queries no longer need to do join to get that info.

But I found an interesting question on Stack Overflow, which was asking about whether a foreign key could reference a view. He was trying to do it for the purpose of data integrity – which would probably require an expensive trigger. But let’s talk about the Query Optimizer, which I think is another useful reason to have a foreign key hook into a view.

As a view is only a stored sub-query, it’s a strange request, but one that is definitely interesting. A view can be instantiated as an indexed view, so I thought there was definitely potential in his question.

Unfortunately I had no luck trying to help him.

My first thought was “Well, you’re going to need a unique index on the view – well that’s doable – a Clustered Index on an Indexed View.”, but you get an error saying that the foreign key needs to reference a user table. A user table? So it can’t even reference a system table. But I guess that’s fair enough these days, since we don’t really have system tables any more (they’re all system views, referencing underlying functions and the like).

My next thought was whether or not you could recreate something similar using a filtered index. After all, a filtered index satisfies many of the scenarios that indexed views often address. They can be unique for example, so you might use an indexed view to make sure that once a particular value is reached in a Status field, some other field must be unique. In SQL Server 2008 it’s no problem to create a unique index and provide a WHERE clause for it. But unfortunately when you create a foreign key, you indicate the table you’re referencing, not the index. I tried using a check constraint, to make sure that the foreign column could only contain values that were in the unique filtered index, but SQL just doesn’t consider that a filtered index is available for referencing by a foreign key.

If you know a way to help this guy, follow the link and provide an answer. In a couple of days’ time, the answer with the most votes will be automatically marked as the answer by the Stack Overflow system, which won’t help him much. He understands that using views doesn’t negate the FK constraints on the underlying tables, and I hope he can still work out a way of doing it without triggers, but at the end of the day, this other answer of mine, where I eventually say “Use triggers on the tables which contribute to your view” is probably the unfortunate truth.