Steve Koop spoke recently at the Adelaide SQL Server User Group, talking about things which don’t convert particularly nicely when upsizing from Microsoft Access to SQL Server 2008. I think this is a really important thing for SQL people to know, as there seem to be many Access databases living in even the largest organisations.
One of the things he mentioned was DISTINCTROW. I’ve never really known what DISTINCTROW does, so I asked him. He sent me a link which explained the difference between DISTINCTROW and DISTINCT, and it described as “DISTINCTROW works on records, not just individual fields”. This might be good for some people, but I wanted to know a little more.
http://msdn.microsoft.com/en-us/library/aa140015.aspx says “The DISTINCTROW keyword is similar to the DISTINCT keyword except that it is based on entire rows, not just individual fields.” – but it also goes on to say a little more, confirming my suspicions. “It is useful only when based on multiple tables, and only when you select fields from some, but not all, of the tables.”
So DISTINCTROW is more about the JOIN type than anything else. It only applies if you are querying multiple tables, but not returning fields from all of them. So it’s a SEMI JOIN to the unused tables, which you write SQL Server using a WHERE EXISTS clause. It’s not really like DISTINCT at all – it’s about doing a Join without seeing the ‘duplication’ effect, clearly only feasible if you’re not returning columns from the other table.
If you’re not sure what a Semi Join is, then just think about the WHERE EXISTS clause, and it should become clear. If you look at the execution plan of a query in SQL Server that uses WHERE EXISTS, you’ll see that it’s doing a Semi Join. And if you’re looking at queries which use DISTINCTROW, consider changing them to WHERE EXISTS instead.