A moving time…

For those of you who read my blog using RSS, you probably won’t even see this post, but either way.

I’ve decided to move my blog to sqlblog.com. I have reposted some of my recent posts, but will not be posting any further content here at msmvps.com.

Both the feed from msmvps.com and sqlblog.com use feedburner.com, so the old feed will continue working, but I recommend changing to the new one in case my blog disappears from msmvps one day.

So farewell, msmvps.com (and thanks Susan!), it’s been fun. I’m sure I will still have many posts which link back here…

T-SQL Tuesday #003 (Relationships): The round-up

Lots of blog posts for this month, for the first T-SQL Tuesday to leave the safe-haven of Adam Machanic’s blog. Some people obviously missed out, probably because they don’t read this blog, but I guess that’s the nature of the meme. I don’t know who is hosting next month yet, but I’ll be looking out for Adam to post something about it in early March.


All the posts had to appear as trackbacks or comments on the invitation to participate, but this post is a short summary for posterity.


As the second week of February involves Valentine’s Day (and a few days earlier, my wedding anniversary), I thought the topic of Relationships would be a nice one for this event. There was a good range of topics too, which I have ordered by the type of relationships chosen.


Special mention:


It wasn’t quite on the right day, but a big congratulations to Michelle Ufford who almost managed to time the birth of her new baby Chloe for T-SQL Tuesday.


We have “purely technical” posts:


Brad Schulz is keeping the bar way too high, and if you haven’t read his pieces from previous Tuesdays, then I recommend you go through the history of posts on his blog. This month he has written a letter of disappointment to the FROM clause. At some point Brad will likely be asked to compile these posts into a book, but until that happens, you’ll have to follow the link to his blog. It’s entertaining, but still fits in the “purely technical” category.


John Dunleavy demonstrated (complete with screenshots – something I should put more of in my posts) how foreign keys can be made so easily using the Diagram part of Management Studio. It’s not something I do much of, but I have to admit that reading posts like John’s can often inspire me to changing my ways.


Bryan Smith also talked about database diagrams, and how they can be used to discover relationships in a system.


Michael Coles officially missed the deadline, but I’m going to link to him anyway, demonstrating a nice trick for creating a Product aggregate based on the relationship between a number and its logarithm.


Allen White wrote about the fact that any RDBMS should have relationships to really be considered relational. Great reminder of some of the basics.


Marco Russo timed a piece on relating tables in DAX amazingly well, and only realised that it qualified for T-SQL Tuesday after he had initially posted it. Useful piece, which will appear in search engine results for years to come I’m sure.


Rob Farley wrote a pile of rubbish… hey, that’s me! I wrote about the importance of relationships in a database system to help the Query Optimizer do its job. I also surmised that Foreign Keys using candidate keys (rather than the primary key) might help many queries do away with needing joins. If you have thoughts on that I’d still like to hear them.


Some posts that were about less technical relationships:


Jen McCown wrote about how she left her husband Sean, in a piece called ‘I love you, I quit’. Actually, she was just leaving a job, but it’s a nice piece about the degradation of a employer-employee relationship. It rings too true for all of us I think, and I hope that as an employer I manage to ‘keep the mystery’ for my employees to stop them going through that same experience.


Jason Brimhall wrote about some of the different relationships in his life, particularly how he wants to make sure that the Parent-Child relationship in his life doesn’t become a Foreign relationship. Nicely done Jason.


Allen Kinsel recommends that relationships with professional organisations can be deepened, and that this can be very beneficial. I’m sure he’s right. I have a tendency to get extra-involved in groups, and I hope Allen’s sentiments are heard by many.


The link to Steve Jones’ article must’ve changed. The link that I followed today didn’t work, even though I’d read it successfully a couple of days ago. Still, I managed to find it, and I can thoroughly recommend reading about the relationships between Steve and his colleagues. He’s most definitely correct in pointing out that any effort in developing personal relationships with your colleagues will help you get things done!


And some “combination” posts:


Mike Walsh provided the first piece of the day, with an excellent run down of various types of relationships that are important, including a recommendation to read up on database design. Great advice, Mike. Thanks.


With Kalen Delaney’s post we almost got two for the price of one. A brilliant prelude talking about some of the personal relationships that have enhanced her SQL career, followed up by covering how foreign key relationships have developed through the versions of SQL Server. Kalen’s blog posts are always worth reading, as I’m sure everyone who is reading this post appreciates.


Stuart Ainsworth’s piece was on Maslow, drawing parallels between the hierarchy of needs of a person with the hierarchy of needs of a database system. It’s thought-provoking, and something that I feel could be made into a poster for database developers’ walls.


In summary


I’d like to thank everyone who has taken part, and for Adam for having introduced the T-SQL Tuesday concept to us. Keep your eye on his blog to find out what’s going on next month. If your name isn’t listed here, then I encourage you to write something for March.


Also remember that lots of these people are on Twitter and are very much followable. Look at the hashtag #tsql2sday for related posts, and make sure you follow the people who post blogs for these events.

The Query Optimizer’s handling of Relationships for T-SQL Tuesday #003

I’m feeling the pressure for this month’s T-SQL Tuesday, probably because I’m also the host. I’ll be posting a roll-up for it soon too, which I’m sure will be great fun researching.


Given that the topic is on relationships, and the main SQL Server database is a relational engine, relationships are incredibly relevant to databases. The idea behind RDBMSs is that keys are used to refer to entities. This leads to foreign keys, such that a particular column(s) is constrained to values which appear in another table, thus referential integrity is enforced. And yet there are so many database designs out there that do not have relationships defined between tables. I shudder when I find them, but that doesn’t make them any less commonplace.


In data warehouses, designed primarily for reporting systems rather than transactional systems, tables are often designed in a more denormalized manner, to avoid needing to perform so many joins to access the data required for reports. This involves having tables with many extra columns, containing data that would otherwise be stored in other tables. Fewer tables are used, and therefore the system has fewer relationships.


I sometimes wonder how this should affect relational database design. I have written before about the fact that the Query Optimizer can leverage foreign key relationships to be able to simplify queries by noticing that joins can be redundant and simplified out of plans, but to summarise:


A foreign key relationship is between a column (or set of columns) in a table to a unique key (typically the primary key) in another table (which could even be the same one). Because of this uniqueness, the relationship can map to at most one record on the other side. And because the foreign key relationship enforces referential integrity, it must map to exactly one record on the other side (a caveat being that the foreign key column(s) could be configured to allow NULLs, which won’t map). Therefore, a join that doesn’t actually select data from any of the columns in the second table might be able to be simplified out of the query completely, as if the query didn’t have the join at all. But read my other post for more on that.


Thinking about many of the queries that I’ve written over the years, I know that I often only want one field from the table I’m joining. For example, I might want to get a ProductName when I have a ProductID, or I might want the Login from a UserID. A standard Lookup situation, which in a data warehouse would often be handled by storing the Name in a dimension table rather than the ID.


So my surmising leads me to this question:


If there is a unique index on the field that I typically want to lookup from a table, does this make it a better candidate for foreign key relationships than the primary key, so that the system can avoid needing as many joins?


This screams against everything I ever learned about relational databases. It would obviously make for a larger row, but if this were offset by performance gains in querying, could it be worthwhile? Maintaining referential integrity based on a string field may be more costly than on an integer field, but I’m wondering if the impact on SELECT queries through the reduction of the number of joins required might not make it a worthy consideration.


Please note: I’m not saying it’s necessarily a good idea – I’m surmising here, and would love to hear comments about whether or not other people have tried it, what circumstances they were trying to handle, and whether or not the idea worked.


But back to the Query Optimizer…


The QO needs information to be able to work out how to run your query. It needs statistical information about the columns that are filtered; it needs to be able to figure out which parts of the query can utilise indexes effectively (see my recent posts about SARGability); but also very significantly, it needs to have information about the relationships between tables.


Any time you write a query that involves a join (which I imagine most of your queries do), the system can use information about the relationship between the two tables. If it is defined as a foreign key relationship that doesn’t allow NULL values, then the system knows that each record in the ‘child’ table must match exactly one record in the other table. The Query Optimizer can use this information, and it should be available. Any time you’re thinking of not putting enforced relationships in your system, consider the fact that you’re blinding the Query Optimizer and ultimately making it do more work.


Put your relationships into your database design. Put constraints where they can apply, mark items as unique. The Query Optimizer will thank you for it, expressing its thanks as better performance.

Invitation for T-SQL Tuesday #003: Relationships

It’s time for the third of Adam Machanic’s T-SQL Tuesdays, and this time, I’m the host. The first one, last December was on the topic of date/time, and the second was on Puzzling Situations. Check them both out, along with the round-ups that Adam wrote about them. Lots of great topics, which is starting to make me anticipate the content that comes out on the second Tuesday of each month.


As an early volunteer to host, I have been given the honour of being the first person chosen to host one. I’d like to claim that this implies some sort of special relationship between myself and the SQL community as a whole, but it’s actually just a “first-in, best dressed” policy – although the ‘best dressed’ analogy is lost on me.


Theme


Valentine’s Day is coming up. Hopefully I don’t need to tell you that it’s on February 14th, but if you’ve read this far into the post then perhaps you’re involved with databases for some reason and may need reminding. Shopping centres around the world have signs up reminding us to buy flowers for our loved ones, but I know many people in IT circles who don’t tend to go to such places, lurking in dark corners of houses until all hours of the night, surviving on pizza. Hopefully this theme will not only prompt some interesting posts, but also prompt people to go out and invest in the meaningful relationships in their own lives. Actually, if you don’t know that Valentine’s Day is February 14th, I’m guessing you don’t have anyone in your life worth buying for. ;)


For me, Valentine’s Day is only three days after my wedding anniversary, so I can’t forget either – as if I would.


So the theme for this month’s T-SQL Tuesday is Relationships.


There are a massive number of options you could go with for this theme. You could talk about Foreign Keys in the relational world. You could wax lyrical about the benefits of attribute relationships in cube design. You could write a poem for your loved one, apologising for all those hours spent in front of a Management Studio window, trying to tune a query, rather than tuning your guitar to serenade her.


Other ideas include: Relationships between Devs & DBAs, Clients & Vendors, Entities, data types, concepts (eg: Report Model & Cube), and more… if you’re struggling to think of something, drop me a line (twitter, Msgr, email, whatever – a list of contact options is over on the left) and I can help.


But so long as you can loosely tie your post to both the theme and some aspect of SQL Server, that’s fine. Be creative, informative, reflective, and hopefully relevant.


Rules


Please note that the time zone for this Tuesday is UTC. For me, that means between 10:30am Tuesday and 10:30am Wednesday. For you, it might mean some time on Monday afternoon to some time on Tuesday afternoon. If you’re lucky (read ‘English’), then you can publish your post any time on Tuesday. It’s about when it’s published though, not when you write it. I encourage you to write your post in advance, in case you’re busy on the day.


So the rules are:


  • Your post must go live after 00:00:00 UTC on Tuesday, February 9, 2010, but before 00:00:00 UTC on Wednesday, February 10, 2010
  • Your post must link back to this one, and it’s recommended that you clearly identify the post as a T-SQL Tuesday post
  • You are responsible for ensuring that a trackback or comment appears here so that I can find the posts

Follow the rules, and your post will be included in the roundup to be posted on a day or two later. Don’t follow the rules, and it won’t show up there. Simple as that!


Twitter


Follow the event on Twitter by watching for the #TSQL2sDay hash tag. (The ‘2’ refers to the fact that it’s the second Tuesday of the month, and nothing to do with the pronunciation of the word, which is more like “Choose-day” where I come from.)


Additional Notes


Please make sure you put a link in your post to this one, and post a comment here if the trackback doesn’t appear. I normally moderate comments to my blog (to avoid spam), but may consider changing that policy for 24 hours next week. I have noticed people trying to take part, but failing to make sure that a trackback/comment has appeared. So please check this.


Please check the time zone. It’s something that we’re used to here in Australia, but I know a few people missed out last month not realising that the event had shifted to UTC.


Let Adam Machanic know if you want to host. You can contact him in a variety of ways, as he mentioned last time. To host, you must have participated in two previous T-SQL Tuesday events, and your blog must have had at least one post a month for the prior six months. Let him know you’re keen even if you don’t meet these criteria, as I believe there’s a list, and you might be able to get those blog posts sorted before your turn is up. When I volunteered to host, I hadn’t participated in any yet…


If you want to host but don’t have the faintest clue on what topic to use, ask Adam for some ideas. I’ve suggested a few to him, and I think he’s putting a list together to maintain the longevity of all this. Similarly, let him know if you have some ideas – you might suggest something that catches people’s imagination like never before.


Feel free to contact Adam or myself if you have any comments or ideas for all this. I’ll probably refer you to Adam if it’s a general thing, or happily do my best to answer you if it’s about this month’s event in particular.


Have lots of fun! I look forward to reading your posts!


Edited: The round-up is at http://msmvps.com/blogs/robfarley/archive/2010/02/13/t-sql-tuesday-003-relationships-the-round-up.aspx

A CASE study in SARGability

A recent discussion on Twitter about a query that Denny Cherry was looking at led to this post by Josef Richberg: http://josef-richberg.squarespace.com/journal/2010/1/28/is-a-case-statement-considered-sargable.html, and I thought it might be worth going through a few points on the topic of SARGability. Particularly given that I wrote a related post recently myself.


If something is SARGable, it means it can be used as a Search Argument – that is, it can be used effectively in conjunction with an index.


To repeat a common analogy, trying to find “Farley” in the phonebook is a Seek (I can pretty much go straight there), whereas trying to find names which end in “arley”, is not. Those names could be just about anywhere.


The way that an Index Seek works is that the system can go to a record and tell whether the row(s) it’s looking for is to the left of the current value or to the right. Looking for “F” is easy. I open the book at “M” and know that I must go left. I open the book at C and know that I must go right. It’s SARGable.


As Josef writes, applying a function to a field stops it being SARGable. If we wanted to say WHERE RIGHT(Surname,5) = ‘arley’  we would quickly see a performance problem, as finding these records would mean starting the Aardvarks (in case there was an ‘Aardvarkarley’), and end with the ZZzzzs (in case there was a ‘ZZzzarley’). On the other hand, looking for WHERE Surname LIKE ‘Farl%’ is quick, because we see that evaluating surnames using a wildcard that doesn’t appear at the start doesn’t affect the SARGability.


At first glance, an ON clause such as:


ON g.GroupId = CASE
             WHEN c.GroupId <> l.GroupID
               AND l.GroupID <> @AccGroupID
             THEN l.GroupID
             ELSE c.GroupID
             END


…would seem to be nasty for SARGability. We’re not just looking for a row where GroupId = AParticularyGroupID, we’re looking for something that is the result of a CASE function, which is going to change for every combination of rows in the tables ‘c’ and ‘l’.


But actually, it’s a lot more complicated than that, and yet in some ways, a lot more simple.


Let’s think about the term SARGable again. It means that we’re able to use it as a Search Argument – but what is the ‘it’ we’re looking at? What is the thing that is SARGable? (This next bit is the key – so I’ll write it again in a moment, in bold) The thing that is SARGable is the term that is indexed – in other words, g.GroupId. It doesn’t matter how much work is required to work out that value – it’s whether or not you can find it in the index.


SARGability is about the thing that is (hopefully) indexed in the table (or set) that you’re introducing into the query.


In the phonebook, RIGHT(Surname,5) isn’t in the index. No index on Surname is going to help match RIGHT(Surname,5) (unless you want to index the result of that function). But if you wanted to say WHERE Surname = REVERSE(‘yelraF’) then there’s no problem. Despite how nasty REVERSE is, it doesn’t affect what we’re looking up in the book. We can evaluate REVERSE, and then (using the index) find the Surname that matches that. Surname = something is SARGable.


So the CASE clause above is largely irrelevant to finding something by GroupId. The result of the CASE can be easily worked out, and then a seek done on an index on the ‘g’ table.


That was the simple aspect of it.


Consider that you have a diary, and you want to mark a week before some birthdays, so that you can go shopping. Suppose you have Jul 11, Feb 6, Dec 10. You take one of the birthdays, work out the date that is a week before it, then do an Index Seek to find that date in the diary. Then you repeat it twice more. This query might be along the lines of:


FROM people p JOIN diary d ON d.diarydate = dateadd(week, -1, p.birthday)


You’ll notice that we happily did an Index Seek on the diary. The dateadd function on p.birthday had no effect on us. The SARGability applies on whichever table/set we’re thinking about at the time, which in this scenario is the diary.


But consider that we wrote the query like this, which essentially means the same, looking for birthdays and diary dates that are a week apart, but applying the function to d.diarydate instead:


FROM people p JOIN diary d ON p.birthday = dateadd(week, 1, d.diarydate)


Logically, this means the same. But this is much more suited to finding the birthday once we know the diarydate, rather than the other way around. This might be fine if we had a thousands of people to consider, and we wanted to look up a mere 365 diary dates in our list of people (indexed by birthday)… but we’d still be looking 365 dates. The function still kills the SARGability of the column, it’s a question of which column we want to be considered indexable.


Repeating the matching process over and over like this is known as a Nested Loop. As quick as it might be to find the record because of SARGability, we’d still be doing it 365 times. Things would be much better if we could index both sides in this scenario. Then, we might be able to utilise a different kind of join, such as a Merge Join, which involves running through two ordered sets, comparing the values to find matches in a single pass. But for this to apply, both sides must be SARGable. Consider two tables with numbers in them… one with 1, 1, 5, 10, 10, 200, and one with 3, 4, 5, 6, 7, 8, 8, 10. To do a Merge Join between them, the system will see that the first table starts with 1, while the second one starts with 3. It will skip through the 1s in the first table to the 5, and then start skipping through the second table to find matches. It’s very fast, but it needs to be able to know whether the value it’s looking for is to the left or right of where it’s up to. This would involve an Index Scan on both sides, but it would be quicker than doing lots of Index Seeks. A single seek is faster than a single scan, but something quick done over and over can take longer than using a slow method one time.


Unfortunately, the Query Optimizer currently isn’t smart enough to know that dateadd(week, 1, d.diarydate) doesn’t change the order of dates in the diary, and that it can easily tell whether it should go left or right. That’s the topic of my earlier blog post, and a Connect item to ask that Microsoft fix this.


So should you care about SARGability? After all, if the system is doing an Index Seek anyway, you’ve got SARGability in play, and it shouldn’t matter.


You should try to arm the Query Optimizer with as many options as possible, so that it can use the best plan available, based on the statistics. We might be happy with looking up three diary dates in a Nested Loop, but we wouldn’t be happy doing that thousands of times. Think about how YOU would solve the problem without a computer (like my diary analogy, or the phonebook). If your execution plan is similar to your paper-based solution, then you’ve done well. If it’s not, then maybe you should look into it some more and work out if the query is okay or not.


In Josef’s situation, the Index Seek was being done 1155 times. That might be okay, but it also might not be. But should the query be rewritten? That’s a different question, that I couldn’t really answer without knowing more. The most important thing about a query is not its performance, but its correctness. If you need to kill the SARGability to maintain correctness, then so be it. You can always handle it with an indexed computed column if you need to, or even an indexed view.

SARGable functions in SQL Server

SARGable is an adjective in SQL that means that an item can be found using an index (assuming one exists). Understanding SARGability can really impact your ability to have well-performing queries. Incidentally – SARGable is short for Search ARGument Able.


If you have an index on phone numbers using LastName, followed by FirstName, including the suburb and address fields, you have something akin to the phone book. Obviously it becomes very easy to find people with the surname “Farley”, with the first name “Rob”, but often you want to search for people with the surname “Farley” with the first name beginning in ‘R’. I might be listed as “R Farley”, “R J Farley”, “Rob Farley”, “Robert Farley”, “Robert J. Farley”, or a few other variations. It complicates things even more if you need to find someone with a name that shortens a different way, like John/Jack, or Elizabeth/Betty. This is where SARGability comes into play.


Let’s just think about the First names for a minute.


If you want to find all the names that start with R, that’s easy. They’re all together and you can get to them very quickly. This is comparable to a query in SQL Server like this, (taking advantage of the index on the Name column in Production.Product)


select Name, ProductID
from Production.Product
where Name like ‘R%’ ;


Looking in the Execution Plan, we see an Index Seek to find the 52 rows, and the seek has a Seek Predicate like this (by looking in either the ToolTip of the operator, the Properties window, or the XML itself):


Seek Keys[1]: Start: [AdventureWorks].[Production].[Product].Name >= Scalar Operator(N’R’), End: [AdventureWorks].[Production].[Product].Name < Scalar Operator(N’S’)


This shows that the system looks as the LIKE call, and translates it into a greater-than and less-than query. (Interestingly, have a look at the End Seek Key if you tell it to find entries that start with Z)


So the LIKE operator seems to maintain SARGability.


If we want to consider Names that have R for the first letter, this is essentially the same question. Query-wise, it’s:


select Name, ProductID
from Production.Product
where LEFT(Name,1) = ‘R’ ;


Unfortunately the LEFT function kills the SARGability. The Execution Plan for this query shows an Index Scan (starting on page one and going to the end), with the Predicate (not, not Seek Predicate, just Predicate) “substring([AdventureWorks].[Production].[Product].[Name],(1),(1))=N’R’”. This is bad.


You see, a Predicate is checked for every row, whereas a Seek Predicate is used to seek through the index to find the rows of interest. If an Index Seek operator has both a Predicate and a Seek Predicate, then the Predicate is acting as an additional filter on the rows that the Seek (using the Seek Predicate) has returned. You can see this by using LIKE ‘R%r’


Considering the first part of a string doesn’t change the order. SQL knows this because of the way it handles LIKE (if the left of the string is known), but it doesn’t seem to get this if LEFT is used. It also doesn’t get it if you manipulate a field in other ways that we understand don’t affect the order.


select ProductID
from Production.Product
where ProductID + 1 = 901;


This is doing a scan, checking every row, even though we can easily understand what we mean. The same would apply for this query (assuming there’s an index on OrderDate):


select OrderDate
from Sales.SalesOrderHeader
where dateadd(day,1,OrderDate) = ’20040101′
;


And perhaps most significantly:


select OrderDate
from Sales.SalesOrderHeader
where dateadd(day,datediff(day,0,OrderDate),0) = ’20040101′
;


…which is largely recognised as being an effective method for date truncation (and why you should always compare dates using >= and < instead)


But more interestingly…


…this query is just fine. Perfectly SARGable.


select OrderDate
from Sales.SalesOrderHeader
where cast(OrderDate as date) = ’20040101′
;


This query does a little work to figure out a couple constants (presumably one of them being the date 20040101, and another being 20040102), and then does an Index Seek to get the data.


You see, the date and datetime fields are known to have a special relationship. The date type is essentially the left-most three bytes of a datetime type, and therefore the ordering is identical.


It doesn’t work if you want to do something like:


select OrderDate
from Sales.SalesOrderHeader
where convert(char(8), OrderDate, 112) = ’20040101′
;


…but did you really think it would? There’s no relationship between strings and dates.


I wish it did though. I wish the SQL team would go through every function and think about how they work. I understand that CONVERT will often change the order, but convert using style 112 won’t.


Also, putting a constant string on the end of a constant-length string shouldn’t change the order. So really, this should be able to work:


select OrderDate
from Sales.SalesOrderHeader
where convert(char(6), OrderDate, 112) + ’01′ = ’20040101′
;


But it doesn’t.


Interestingly (and a prompt for this post), the hierarchyid type isn’t too bad. It understands that some functions, such as getting the Ancestor won’t change the order, and it keeps it SARGable. Here the asker had noticed that GetAncestor and IsDescendantOf are functions that don’t kill the SARGability – basically because the left-most bits of a hierarchyid are the parent nodes.


http://stackoverflow.com/questions/2042826/how-does-an-index-work-on-a-sql-user-defined-type-udt


Spatial types can show similar behaviour.


So I get the feeling that one day we might see the SQL Server team implement some changes with the optimizer, so that it can handle a lot more functions in a SARGable way. Imagine how much code would run so much better if order-preserving functions were more widely recognised. Suddenly, large amounts of code that wasn’t written with SARGability in mind would start running quicker, and we’d all be hailing the new version of SQL Server.


I’ve raised a Connect item about this, at https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=526431


You may have code that would run thousands of times faster with this change. That code may live in third party applications over which you have no control at all. If you think there’s a chance you fall into that bracket, why not go and vote this up?

T-SQL Tuesday – HAVING Puzzle answer

Earlier today you may have seen a blog post of mine about a puzzle involving HAVING. You should read that post before this one. It was part of Adam Machanic’s T-SQL Tuesday meme.

The question was about the query:

SELECT ‘No Rows’
WHERE 1=2
HAVING 1=1;

And here’s the explanation.

Start by making yourself a "dual table", like what you’d use in Oracle, and use this instead of having no FROM clause. Put a row in it.

CREATE TABLE dual (dummy bit);
INSERT dual VALUES (1);

–Now count the rows in it
SELECT COUNT(*)
FROM dual;

–Now count how many rows don’t match 1=2 (of course, the answer is zero)
SELECT COUNT(*)
FROM dual
WHERE 1=2;

–Naturally we’d get nothing back if we weren’t grouping
SELECT ‘Something’
FROM dual
WHERE 1=2;

–But HAVING forces the grouping functionality as well (like using COUNT(*))
SELECT ‘Something’
FROM dual
WHERE 1=2
HAVING 1=1;

–So in this query, we couldn’t put any of our real columns in, only aggregate functions and constants
SELECT *
–Errors
FROM dual
WHERE 1=2
HAVING 1=1;

–And leaving out the FROM clause implies that we’re asking all this of a secret internal table with a single row. All these queries work just the same without the FROM clause at all.

–Count the rows in our pretend table (one)
SELECT COUNT(*)

–Now count how many rows don’t match 1=2 (zero)
SELECT COUNT(*)
WHERE 1=2;

–Naturally we’d get nothing back if we weren’t grouping
SELECT ‘Something’
WHERE 1=2;

–But HAVING forces the grouping functionality as well
SELECT ‘Something’
WHERE 1=2
HAVING 1=1;

So the answer to the question posed is that you get a single row, containing the text provided. The fact that I used the text ‘No Rows’ was just a bit of fun.

Now, to remove the trivia a little…

When would you ever use HAVING without GROUP BY in a practical situation?

How about this:

Using sp_MSforeachdb, find the number of objects in non-system databases. It’s an undocumented system stored procedure which runs a query on each database, replacing a question mark in the query with the name of the database. It can be quite handy, just don’t look at how it’s implemented.

EXEC sp_MSforeachdb ‘SELECT ”?”, COUNT(*) FROM ?.sys.objects WHERE ”?” NOT IN (”master”,”tempdb”,”model”,”msdb”);’;

But this won’t do it. It will still return the entries for the system databases, but with zeroes (because none of the objects satisfied the WHERE clause). Replace WHERE with HAVING and it’s just fine – the rows get eliminated from the resultset.

EXEC sp_MSforeachdb ‘SELECT ”?”, COUNT(*) FROM ?.sys.objects HAVING ”?” NOT IN (”master”,”tempdb”,”model”,”msdb”);’;

Honestly, HAVING doesn’t require a GROUP BY clause. It doesn’t require anything. It filters based on groups, and if there are no groups yet, it makes some – like how using an aggregate will count the rows in an empty set and return one row representing that group.

It’s generally taught as "HAVING is for filtering based on aggregates", and that’s true, but only half the story. And I find that if I’m teaching people to write better queries, I want them to have a thorough understanding of what each construct is really doing.

T-SQL Tuesday – T-SQL Puzzle with HAVING

Adam’s hosting another T-SQL Tuesday, for which this post is jumping in. He’s themed it around T-SQL Puzzles, which I found quite interesting, because the world is full of them.

Most of the questions that I answer on forums, help sites, and so on, are puzzles. I guess there’s the difference between “Problem” and “Puzzle”, but I prefer to think of thing as puzzles.

For Adam’s meme though, I thought I’d share a Puzzle that I ask students who take my Advanced T-SQL course. The idea is to have them start thinking about what each component of T-SQL is actually doing, so that they can better address problems they face. If you have a rifle, it’s nice to actually know what the various components of it are for, so that you can use it more effectively.

I actually ask them a large number of things, but the one that I thought I’d pose for you all today is about the results of this. The answer will be in the next blog post, which hopefully you haven’t read yet. I will have them both published on Tuesday 12th, this one at the start of the day, and the answer towards the end of the day.

The question is simply a query. Can you predict the output, and explain why? Feel free to comment to your heart’s content, as I will moderate them and only publish them afterwards. In fact, I’ll probably take a few days to get to them (being holiday period), so I apologise if you’re wanting to read what other people thought too.

Naturally, you can check your answer by actually running the query, but please provide your thoughts before you do. The query is below. There is no FROM clause. There is no GROUP BY clause. Does it error, do you get an empty resultset, do you get a single row containing NULL, do you get a single row with data, do you get multiple rows, or something else I haven’t suggested? Enjoy.

SELECT ‘No rows’
WHERE 1=2
HAVING 1=1;

Foreign Keys against Views

Foreign Keys are great. For those of you who read some of my other posts, or heard me present on this material before, a foreign key can really help the Query Optimizer simplify out your query, because it knows that it must be able to find a match in an equality join (unless the field allows NULLs, which is a whole nother story).

I also blogged recently about the fact that a foreign key doesn’t have to reference the primary key of a table. Turns out it can reference any candidate key, that is, one that has a unique index on it. This presents all kinds of interesting design ideas, such as avoiding joins by storing a different field (such as the username, rather than userid, if the username is the one thing you keep looking up to display at the top of the webpage). Referential integrity can still be maintained happily using an index seek (just a slightly larger index, but quick nonetheless), but the benefit could be huge if many queries no longer need to do join to get that info.

But I found an interesting question on Stack Overflow, which was asking about whether a foreign key could reference a view. He was trying to do it for the purpose of data integrity – which would probably require an expensive trigger. But let’s talk about the Query Optimizer, which I think is another useful reason to have a foreign key hook into a view.

As a view is only a stored sub-query, it’s a strange request, but one that is definitely interesting. A view can be instantiated as an indexed view, so I thought there was definitely potential in his question.

Unfortunately I had no luck trying to help him.

My first thought was “Well, you’re going to need a unique index on the view – well that’s doable – a Clustered Index on an Indexed View.”, but you get an error saying that the foreign key needs to reference a user table. A user table? So it can’t even reference a system table. But I guess that’s fair enough these days, since we don’t really have system tables any more (they’re all system views, referencing underlying functions and the like).

My next thought was whether or not you could recreate something similar using a filtered index. After all, a filtered index satisfies many of the scenarios that indexed views often address. They can be unique for example, so you might use an indexed view to make sure that once a particular value is reached in a Status field, some other field must be unique. In SQL Server 2008 it’s no problem to create a unique index and provide a WHERE clause for it. But unfortunately when you create a foreign key, you indicate the table you’re referencing, not the index. I tried using a check constraint, to make sure that the foreign column could only contain values that were in the unique filtered index, but SQL just doesn’t consider that a filtered index is available for referencing by a foreign key.

If you know a way to help this guy, follow the link and provide an answer. In a couple of days’ time, the answer with the most votes will be automatically marked as the answer by the Stack Overflow system, which won’t help him much. He understands that using views doesn’t negate the FK constraints on the underlying tables, and I hope he can still work out a way of doing it without triggers, but at the end of the day, this other answer of mine, where I eventually say “Use triggers on the tables which contribute to your view” is probably the unfortunate truth.

Apostrophes around column aliases

Far too often I see SQL code which uses apostrophes around column aliases, like this:


SELECT ProductID, SUM(LineTotal) AS ‘Total’
FROM Sales.SalesOrderDetail
GROUP BY ProductID;


This is fine, but the worrying thing about this is if the user decides to use this alias in an outer query.


SELECT ‘ID’, ‘Total’
FROM
(
SELECT ProductID AS ‘ID’, SUM(LineTotal) AS ‘Total’
FROM Sales.SalesOrderDetail
GROUP BY ProductID
) AS p
;


Here, the outer query will assume that ‘ID’ and ‘Total’ are strings, not the names of columns in sub-query. It’s really not pretty. The query runs, but doesn’t give the correct answers. Furthermore, if this had’ve been done in a GROUP BY clause, or a JOIN, etc, then the error may have been hidden some more. An error might have occurred, but only in certain circumstances.


What should have been done is to have used square brackets, like [Total], or even no brackets at all. Using the table alias in the outer query would have helped too.


SELECT p.ID, p.Total
FROM
(
SELECT ProductID AS ID, SUM(LineTotal) AS Total
FROM Sales.SalesOrderDetail
GROUP BY ProductID
) AS p
;


I’m not sure why SQL Server allows apostrophes to be used around column aliases, but unfortunately it does (and because it does now, it likely always will, if only to maintain backward-compatibility). So instead, any time you see code that uses apostrophes this way, please change it – just to help any developers that come after you who don’t understand where things can fall down.

Just another Microsoft MVPs site

Network-wide options by YD - Freelance Wordpress Developer