Category Archives: 12338

T-SQL Tuesday #003 (Relationships): The round-up

Lots of blog posts for this month, for the first T-SQL Tuesday to leave the safe-haven of Adam Machanic’s blog. Some people obviously missed out, probably because they don’t read this blog, but I guess that’s the nature of the meme. I don’t know who is hosting next month yet, but I’ll be looking out for Adam to post something about it in early March.

All the posts had to appear as trackbacks or comments on the invitation to participate, but this post is a short summary for posterity.

As the second week of February involves Valentine’s Day (and a few days earlier, my wedding anniversary), I thought the topic of Relationships would be a nice one for this event. There was a good range of topics too, which I have ordered by the type of relationships chosen.

Special mention:

It wasn’t quite on the right day, but a big congratulations to Michelle Ufford who almost managed to time the birth of her new baby Chloe for T-SQL Tuesday.

We have “purely technical” posts:

Brad Schulz is keeping the bar way too high, and if you haven’t read his pieces from previous Tuesdays, then I recommend you go through the history of posts on his blog. This month he has written a letter of disappointment to the FROM clause. At some point Brad will likely be asked to compile these posts into a book, but until that happens, you’ll have to follow the link to his blog. It’s entertaining, but still fits in the “purely technical” category.

John Dunleavy demonstrated (complete with screenshots – something I should put more of in my posts) how foreign keys can be made so easily using the Diagram part of Management Studio. It’s not something I do much of, but I have to admit that reading posts like John’s can often inspire me to changing my ways.

Bryan Smith also talked about database diagrams, and how they can be used to discover relationships in a system.

Michael Coles officially missed the deadline, but I’m going to link to him anyway, demonstrating a nice trick for creating a Product aggregate based on the relationship between a number and its logarithm.

Allen White wrote about the fact that any RDBMS should have relationships to really be considered relational. Great reminder of some of the basics.

Marco Russo timed a piece on relating tables in DAX amazingly well, and only realised that it qualified for T-SQL Tuesday after he had initially posted it. Useful piece, which will appear in search engine results for years to come I’m sure.

Rob Farley wrote a pile of rubbish… hey, that’s me! I wrote about the importance of relationships in a database system to help the Query Optimizer do its job. I also surmised that Foreign Keys using candidate keys (rather than the primary key) might help many queries do away with needing joins. If you have thoughts on that I’d still like to hear them.

Some posts that were about less technical relationships:

Jen McCown wrote about how she left her husband Sean, in a piece called ‘I love you, I quit’. Actually, she was just leaving a job, but it’s a nice piece about the degradation of a employer-employee relationship. It rings too true for all of us I think, and I hope that as an employer I manage to ‘keep the mystery’ for my employees to stop them going through that same experience.

Jason Brimhall wrote about some of the different relationships in his life, particularly how he wants to make sure that the Parent-Child relationship in his life doesn’t become a Foreign relationship. Nicely done Jason.

Allen Kinsel recommends that relationships with professional organisations can be deepened, and that this can be very beneficial. I’m sure he’s right. I have a tendency to get extra-involved in groups, and I hope Allen’s sentiments are heard by many.

The link to Steve Jones’ article must’ve changed. The link that I followed today didn’t work, even though I’d read it successfully a couple of days ago. Still, I managed to find it, and I can thoroughly recommend reading about the relationships between Steve and his colleagues. He’s most definitely correct in pointing out that any effort in developing personal relationships with your colleagues will help you get things done!

And some “combination” posts:

Mike Walsh provided the first piece of the day, with an excellent run down of various types of relationships that are important, including a recommendation to read up on database design. Great advice, Mike. Thanks.

With Kalen Delaney’s post we almost got two for the price of one. A brilliant prelude talking about some of the personal relationships that have enhanced her SQL career, followed up by covering how foreign key relationships have developed through the versions of SQL Server. Kalen’s blog posts are always worth reading, as I’m sure everyone who is reading this post appreciates.

Stuart Ainsworth’s piece was on Maslow, drawing parallels between the hierarchy of needs of a person with the hierarchy of needs of a database system. It’s thought-provoking, and something that I feel could be made into a poster for database developers’ walls.

In summary

I’d like to thank everyone who has taken part, and for Adam for having introduced the T-SQL Tuesday concept to us. Keep your eye on his blog to find out what’s going on next month. If your name isn’t listed here, then I encourage you to write something for March.

Also remember that lots of these people are on Twitter and are very much followable. Look at the hashtag #tsql2sday for related posts, and make sure you follow the people who post blogs for these events.

The Query Optimizer’s handling of Relationships for T-SQL Tuesday #003

I’m feeling the pressure for this month’s T-SQL Tuesday, probably because I’m also the host. I’ll be posting a roll-up for it soon too, which I’m sure will be great fun researching.

Given that the topic is on relationships, and the main SQL Server database is a relational engine, relationships are incredibly relevant to databases. The idea behind RDBMSs is that keys are used to refer to entities. This leads to foreign keys, such that a particular column(s) is constrained to values which appear in another table, thus referential integrity is enforced. And yet there are so many database designs out there that do not have relationships defined between tables. I shudder when I find them, but that doesn’t make them any less commonplace.

In data warehouses, designed primarily for reporting systems rather than transactional systems, tables are often designed in a more denormalized manner, to avoid needing to perform so many joins to access the data required for reports. This involves having tables with many extra columns, containing data that would otherwise be stored in other tables. Fewer tables are used, and therefore the system has fewer relationships.

I sometimes wonder how this should affect relational database design. I have written before about the fact that the Query Optimizer can leverage foreign key relationships to be able to simplify queries by noticing that joins can be redundant and simplified out of plans, but to summarise:

A foreign key relationship is between a column (or set of columns) in a table to a unique key (typically the primary key) in another table (which could even be the same one). Because of this uniqueness, the relationship can map to at most one record on the other side. And because the foreign key relationship enforces referential integrity, it must map to exactly one record on the other side (a caveat being that the foreign key column(s) could be configured to allow NULLs, which won’t map). Therefore, a join that doesn’t actually select data from any of the columns in the second table might be able to be simplified out of the query completely, as if the query didn’t have the join at all. But read my other post for more on that.

Thinking about many of the queries that I’ve written over the years, I know that I often only want one field from the table I’m joining. For example, I might want to get a ProductName when I have a ProductID, or I might want the Login from a UserID. A standard Lookup situation, which in a data warehouse would often be handled by storing the Name in a dimension table rather than the ID.

So my surmising leads me to this question:

If there is a unique index on the field that I typically want to lookup from a table, does this make it a better candidate for foreign key relationships than the primary key, so that the system can avoid needing as many joins?

This screams against everything I ever learned about relational databases. It would obviously make for a larger row, but if this were offset by performance gains in querying, could it be worthwhile? Maintaining referential integrity based on a string field may be more costly than on an integer field, but I’m wondering if the impact on SELECT queries through the reduction of the number of joins required might not make it a worthy consideration.

Please note: I’m not saying it’s necessarily a good idea – I’m surmising here, and would love to hear comments about whether or not other people have tried it, what circumstances they were trying to handle, and whether or not the idea worked.

But back to the Query Optimizer…

The QO needs information to be able to work out how to run your query. It needs statistical information about the columns that are filtered; it needs to be able to figure out which parts of the query can utilise indexes effectively (see my recent posts about SARGability); but also very significantly, it needs to have information about the relationships between tables.

Any time you write a query that involves a join (which I imagine most of your queries do), the system can use information about the relationship between the two tables. If it is defined as a foreign key relationship that doesn’t allow NULL values, then the system knows that each record in the ‘child’ table must match exactly one record in the other table. The Query Optimizer can use this information, and it should be available. Any time you’re thinking of not putting enforced relationships in your system, consider the fact that you’re blinding the Query Optimizer and ultimately making it do more work.

Put your relationships into your database design. Put constraints where they can apply, mark items as unique. The Query Optimizer will thank you for it, expressing its thanks as better performance.

Invitation for T-SQL Tuesday #003: Relationships

It’s time for the third of Adam Machanic’s T-SQL Tuesdays, and this time, I’m the host. The first one, last December was on the topic of date/time, and the second was on Puzzling Situations. Check them both out, along with the round-ups that Adam wrote about them. Lots of great topics, which is starting to make me anticipate the content that comes out on the second Tuesday of each month.

As an early volunteer to host, I have been given the honour of being the first person chosen to host one. I’d like to claim that this implies some sort of special relationship between myself and the SQL community as a whole, but it’s actually just a “first-in, best dressed” policy – although the ‘best dressed’ analogy is lost on me.

Theme

Valentine’s Day is coming up. Hopefully I don’t need to tell you that it’s on February 14th, but if you’ve read this far into the post then perhaps you’re involved with databases for some reason and may need reminding. Shopping centres around the world have signs up reminding us to buy flowers for our loved ones, but I know many people in IT circles who don’t tend to go to such places, lurking in dark corners of houses until all hours of the night, surviving on pizza. Hopefully this theme will not only prompt some interesting posts, but also prompt people to go out and invest in the meaningful relationships in their own lives. Actually, if you don’t know that Valentine’s Day is February 14th, I’m guessing you don’t have anyone in your life worth buying for. 😉

For me, Valentine’s Day is only three days after my wedding anniversary, so I can’t forget either – as if I would.

So the theme for this month’s T-SQL Tuesday is Relationships.

There are a massive number of options you could go with for this theme. You could talk about Foreign Keys in the relational world. You could wax lyrical about the benefits of attribute relationships in cube design. You could write a poem for your loved one, apologising for all those hours spent in front of a Management Studio window, trying to tune a query, rather than tuning your guitar to serenade her.

Other ideas include: Relationships between Devs & DBAs, Clients & Vendors, Entities, data types, concepts (eg: Report Model & Cube), and more… if you’re struggling to think of something, drop me a line (twitter, Msgr, email, whatever – a list of contact options is over on the left) and I can help.

But so long as you can loosely tie your post to both the theme and some aspect of SQL Server, that’s fine. Be creative, informative, reflective, and hopefully relevant.

Rules

Please note that the time zone for this Tuesday is UTC. For me, that means between 10:30am Tuesday and 10:30am Wednesday. For you, it might mean some time on Monday afternoon to some time on Tuesday afternoon. If you’re lucky (read ‘English’), then you can publish your post any time on Tuesday. It’s about when it’s published though, not when you write it. I encourage you to write your post in advance, in case you’re busy on the day.

So the rules are:

  • Your post must go live after 00:00:00 UTC on Tuesday, February 9, 2010, but before 00:00:00 UTC on Wednesday, February 10, 2010
  • Your post must link back to this one, and it’s recommended that you clearly identify the post as a T-SQL Tuesday post
  • You are responsible for ensuring that a trackback or comment appears here so that I can find the posts

Follow the rules, and your post will be included in the roundup to be posted on a day or two later. Don’t follow the rules, and it won’t show up there. Simple as that!

Twitter

Follow the event on Twitter by watching for the #TSQL2sDay hash tag. (The ‘2’ refers to the fact that it’s the second Tuesday of the month, and nothing to do with the pronunciation of the word, which is more like “Choose-day” where I come from.)

Additional Notes

Please make sure you put a link in your post to this one, and post a comment here if the trackback doesn’t appear. I normally moderate comments to my blog (to avoid spam), but may consider changing that policy for 24 hours next week. I have noticed people trying to take part, but failing to make sure that a trackback/comment has appeared. So please check this.

Please check the time zone. It’s something that we’re used to here in Australia, but I know a few people missed out last month not realising that the event had shifted to UTC.

Let Adam Machanic know if you want to host. You can contact him in a variety of ways, as he mentioned last time. To host, you must have participated in two previous T-SQL Tuesday events, and your blog must have had at least one post a month for the prior six months. Let him know you’re keen even if you don’t meet these criteria, as I believe there’s a list, and you might be able to get those blog posts sorted before your turn is up. When I volunteered to host, I hadn’t participated in any yet…

If you want to host but don’t have the faintest clue on what topic to use, ask Adam for some ideas. I’ve suggested a few to him, and I think he’s putting a list together to maintain the longevity of all this. Similarly, let him know if you have some ideas – you might suggest something that catches people’s imagination like never before.

Feel free to contact Adam or myself if you have any comments or ideas for all this. I’ll probably refer you to Adam if it’s a general thing, or happily do my best to answer you if it’s about this month’s event in particular.

Have lots of fun! I look forward to reading your posts!

Edited: The round-up is at http://msmvps.com/blogs/robfarley/archive/2010/02/13/t-sql-tuesday-003-relationships-the-round-up.aspx

T-SQL Tuesday – HAVING Puzzle answer

Earlier today you may have seen a blog post of mine about a puzzle involving HAVING. You should read that post before this one. It was part of Adam Machanic’s T-SQL Tuesday meme.

The question was about the query:

SELECT ‘No Rows’
WHERE 1=2
HAVING 1=1;

And here’s the explanation.

Start by making yourself a "dual table", like what you’d use in Oracle, and use this instead of having no FROM clause. Put a row in it.

CREATE TABLE dual (dummy bit);
INSERT dual VALUES (1);

–Now count the rows in it
SELECT COUNT(*)
FROM dual;

–Now count how many rows don’t match 1=2 (of course, the answer is zero)
SELECT COUNT(*)
FROM dual
WHERE 1=2;

–Naturally we’d get nothing back if we weren’t grouping
SELECT ‘Something’
FROM dual
WHERE 1=2;

–But HAVING forces the grouping functionality as well (like using COUNT(*))
SELECT ‘Something’
FROM dual
WHERE 1=2
HAVING 1=1;

–So in this query, we couldn’t put any of our real columns in, only aggregate functions and constants
SELECT *
–Errors
FROM dual
WHERE 1=2
HAVING 1=1;

–And leaving out the FROM clause implies that we’re asking all this of a secret internal table with a single row. All these queries work just the same without the FROM clause at all.

–Count the rows in our pretend table (one)
SELECT COUNT(*)

–Now count how many rows don’t match 1=2 (zero)
SELECT COUNT(*)
WHERE 1=2;

–Naturally we’d get nothing back if we weren’t grouping
SELECT ‘Something’
WHERE 1=2;

–But HAVING forces the grouping functionality as well
SELECT ‘Something’
WHERE 1=2
HAVING 1=1;

So the answer to the question posed is that you get a single row, containing the text provided. The fact that I used the text ‘No Rows’ was just a bit of fun.

Now, to remove the trivia a little…

When would you ever use HAVING without GROUP BY in a practical situation?

How about this:

Using sp_MSforeachdb, find the number of objects in non-system databases. It’s an undocumented system stored procedure which runs a query on each database, replacing a question mark in the query with the name of the database. It can be quite handy, just don’t look at how it’s implemented.

EXEC sp_MSforeachdb ‘SELECT ”?”, COUNT(*) FROM ?.sys.objects WHERE ”?” NOT IN (”master”,”tempdb”,”model”,”msdb”);’;

But this won’t do it. It will still return the entries for the system databases, but with zeroes (because none of the objects satisfied the WHERE clause). Replace WHERE with HAVING and it’s just fine – the rows get eliminated from the resultset.

EXEC sp_MSforeachdb ‘SELECT ”?”, COUNT(*) FROM ?.sys.objects HAVING ”?” NOT IN (”master”,”tempdb”,”model”,”msdb”);’;

Honestly, HAVING doesn’t require a GROUP BY clause. It doesn’t require anything. It filters based on groups, and if there are no groups yet, it makes some – like how using an aggregate will count the rows in an empty set and return one row representing that group.

It’s generally taught as "HAVING is for filtering based on aggregates", and that’s true, but only half the story. And I find that if I’m teaching people to write better queries, I want them to have a thorough understanding of what each construct is really doing.

T-SQL Tuesday – T-SQL Puzzle with HAVING

Adam’s hosting another T-SQL Tuesday, for which this post is jumping in. He’s themed it around T-SQL Puzzles, which I found quite interesting, because the world is full of them.

Most of the questions that I answer on forums, help sites, and so on, are puzzles. I guess there’s the difference between “Problem” and “Puzzle”, but I prefer to think of thing as puzzles.

For Adam’s meme though, I thought I’d share a Puzzle that I ask students who take my Advanced T-SQL course. The idea is to have them start thinking about what each component of T-SQL is actually doing, so that they can better address problems they face. If you have a rifle, it’s nice to actually know what the various components of it are for, so that you can use it more effectively.

I actually ask them a large number of things, but the one that I thought I’d pose for you all today is about the results of this. The answer will be in the next blog post, which hopefully you haven’t read yet. I will have them both published on Tuesday 12th, this one at the start of the day, and the answer towards the end of the day.

The question is simply a query. Can you predict the output, and explain why? Feel free to comment to your heart’s content, as I will moderate them and only publish them afterwards. In fact, I’ll probably take a few days to get to them (being holiday period), so I apologise if you’re wanting to read what other people thought too.

Naturally, you can check your answer by actually running the query, but please provide your thoughts before you do. The query is below. There is no FROM clause. There is no GROUP BY clause. Does it error, do you get an empty resultset, do you get a single row containing NULL, do you get a single row with data, do you get multiple rows, or something else I haven’t suggested? Enjoy.

SELECT ‘No rows’
WHERE 1=2
HAVING 1=1;

T-SQL Tuesday – A date dimension table with computed columns

Quite a few people have asked me to blog about what I do for a date dimension table. I’m talking about a table that Analysis Services references for a Time dimension. It’s going to contain every date in a particular range, and be flexible enough to cater for public holidays and other custom details.

There are plenty of options for this, and I’ll mention some of them a bit later. What I use most of the time is an actual table in the Data Warehouse, which I populate with a row for each date in the range I want to consider. This range starts well before the earliest date I could want, and I don’t leave gaps either. Some people like to only use dates that have fact data, but I prefer to have the dates going back as far as I like.

Let’s talk about what the table looks like, and then how it can be created.

I have a primary key on an integer based on the date, in the format YYYYMMDD. So today would have the number 20091208. I haven’t tried using the date type that’s available in SQL Server 2008 for a date dimension yet – I generally try to use numbers for dimension keys, and haven’t tested the alternative yet. Using an integer like this for the key in a date dimension is generally considered best practice.

I also have a column which is the actual date itself. I will use this as the Value column for the dimension key in Analysis Services. I also have various representations of the date in string form, such as “Tuesday December 8th, 2009”, “08/12/2009”, “8-Dec-2009”. One of these will be the Name column, but I may have others available for other properties and translations. A “12/08/2009” option may be preferable for a US translation, for example.

Columns in my table should indicate which year it is, such as 2009. I’ll also throw in the start of the year (in a date format), and something which indicates which Financial Year it is. In Australia, this is most easily handled by adding six months onto the current date and considering the year of this adjusted date (our FY starts on July 1st). I can subtract the six months back again to work out what the start of the Financial Year is. I try to keep things in the code quite simple, as I leave this code with the client and hope they can maintain it as required. The trickiest I get is to use the DATEADD(month,DATEDIFF(month,0,ActualDate),0) technique for truncation, but I think this should be required knowledge when handling dates.

For months, quarters, semesters, weeks, and so on, I will also prefer to have an integer as the key. A Month Key would take the format 200912 for this month, or 201001 for next month. Quarters can be done using 20094 and 20101, and so on.

This may all seem quite complex, but it’s something you only need to do one time.

Let me explain…

My table only really contains one field. Yes, just one. More might be required for custom fields, but where possible, I will just populate one field and let all the rest be handled using computed columns.

Even the primary key will be a computed column.

CREATE TABLE dbo.Dates (
  DateKey AS CONVERT(int, CONVERT(char(8), ActualDate, 112)) PERSISTED NOT NULL
,ActualDate DATETIME NOT NULL
,CalendarYearKey AS YEAR(ActualDate) PERSISTED NOT NULL
,CalendarYearName AS CONVERT(char(4), YEAR(ActualDate)) PERSISTED NOT NULL
,CalendarYearStart AS DATEADD(year,DATEDIFF(year,0,ActualDate),0) PERSISTED NOT NULL
,FinancialYearKey AS YEAR(DATEADD(month,6,ActualDate)) PERSISTED NOT NULL
,FinancialYearName AS CONVERT(char(4),YEAR(DATEADD(month,6,ActualDate))-1) + ‘/’ + RIGHT(CONVERT(char(4),YEAR(DATEADD(month,6,ActualDate))),2) PERSISTED NOT NULL
,FinancialYearStart AS DATEADD(month,-6,DATEADD(year,DATEDIFF(year,0,DATEADD(month,6,ActualDate)),0)) PERSISTED NOT NULL
,MonthKey AS CONVERT(int, CONVERT(char(6),ActualDate,112)) PERSISTED NOT NULL
,MonthName AS CASE MONTH(ActualDate)
                 WHEN 1 then ‘Jan’
                 WHEN 2 then ‘Feb’
                 WHEN 3 then ‘Mar’
                 WHEN 4 then ‘Apr’
                 WHEN 5 then ‘May’
                 WHEN 6 then ‘Jun’
                 WHEN 7 then ‘Jul’
                 WHEN 8 then ‘Aug’
                 WHEN 9 then ‘Sep’
                 WHEN 10 then ‘Oct’
                 WHEN 11 then ‘Nov’
                 WHEN 12 then ‘Dec’
               END
                  + ‘ ‘ + CONVERT(char(4), YEAR(ActualDate)) PERSISTED NOT NULL
,FrenchMonthName AS CASE MONTH(ActualDate)
                 WHEN 1 THEN ‘janv’
                 WHEN 2 THEN ‘févr’
                 WHEN 3 THEN ‘mars’
                 WHEN 4 THEN ‘avr’
                 WHEN 5 THEN ‘mai’
                 WHEN 6 THEN ‘juin’
                 WHEN 7 THEN ‘juil’
                 WHEN 8 THEN ‘août’
                 WHEN 9 THEN ‘sept’
                 WHEN 10 THEN ‘oct’
                 WHEN 11 THEN ‘nov’
                 WHEN 12 THEN ‘déc’
               END
                  + ‘ ‘ + CONVERT(char(4), YEAR(ActualDate)) PERSISTED NOT NULL
–Many more columns following
);
GO

You will notice that I have used ugly long CASE statements for the MonthName columns. I do the same for the names of the days of the week. The reason is betrayed in the second example. DATENAME (or any kind of conversion that relies upon the language setting, such as CONVERT(char(3),ActualDate,100)) is non-deterministic, and therefore can’t be used in a persisted computed column (I do wish that CONVERT could take a Language setting, so that I could tell it to convert in English, French, etc, and make it deterministic). Why do I want them to persist? Well… I’m just more comfortable with them being persisted. After all, I could use a view for the whole thing at this stage, but I’m really not that comfortable with the table being generated on the fly when it comes to processing. The table is essentially read-only, after all.

As well as many computed columns like this, I will also have some that are not computed, such as a column to indicate if it’s a public holiday. This could be computed, at a push, as public holidays generally follow a system. Even Easter follows a formula that could be applied. But if the company takes a special day, or if government declares an extra day for some reason, then problems can start popping up. I find it convenient to have columns that can be updated directly (but which have defaults, of course).

One great thing about this method is that it can be populated very easily. The only field you insert data into is the ActualDate column. Generating a list of dates is as easy as using DATEADD() with a nums table, as I’ve written many times before, including this StackOverflow question. If you need more dates, just insert more.

As I mentioned before, a view could be used for this. It is very easy to generate a list of dates, and then all the other calculations could be done as other columns in the view. You could perform an OUTER JOIN into a table which lists public holidays and other special days. Analysis Services will happily handle this in much the same way. I just prefer to have it exist as a table, which I feel I have more control over.

This post has been part of T-SQL Tuesday, hosted this month by Adam Machanic. You should be able to see many other posts related to datetime mentioned as Trackbacks to Adam’s post.