The end of collation woes in SQL Server

Collation is a wonderful thing. It’s what makes sure that Greek people can see their data in a Greek order, and Germans can see theirs in a German order. It helps us determine whether data should be case sensitive, or whether those little dots and circles that Swedes use really matter.

But if you have a database that has different collation settings to the SQL instance it’s on, it can be a pain.

Suppose you have an instance of SQL Server, and you decide to restore a database on it. It may be an upgrade, it may be part of a consolidation exercise. Regardless of the reason, it turns out that the instance’s default collation is not the same as the instance that the database was on previously. This will affect the tempdb database.

So your application is ticking along nicely on the new server, until such time as the application needs to create a temporary table. This gets created in tempdb of course. Like most T-SQL code, the application doesn’t specify the collation for this temporary table, and the columns pick up tempdb’s default collation. Ah. Now as soon as you try to perform a join between this temporary table and a table in your application’s database, you get a collation error:

Msg 468, Level 16, State 9, Line 1
Cannot resolve the collation conflict between … in the equal to operation.

And you find yourself needing to change code, or change the SQL installation, or something to be able to resolve this.

But you just want the error to go away. You want it to just pick a collation, maybe the one from the column on the left of the comparison, maybe the one from the right. So long as the error goes away, you may not be all that interested. Chances are it’s just the difference between a Windows collation and a SQL collation, and the differences aren’t significant enough for you to care (I know, there are plenty of cases where you really do care – in which case you probably don’t have so much frustration with this as the rest of us).

Right now, there is no way of making that error go away. But I’m suggesting to Microsoft that there be a database-level option (or an instance-level option, or both – I don’t mind how it gets implemented), that will detect a collation error in a query and handle it in an appropriate way. After all, a collation error is detected before the query is executed, so changing the way it’s handled should be easy enough.

If you think this is worth voting for, then we might be able to see it implemented, and collation errors may become a thing of the past. They shouldn’t be ignored completely – I just don’t want to be the victim of coding that doesn’t specify the collation.

Vote at: http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=324910

Querying distances using Spatial data types in SQL 2008

SQL Server 2008 has spatial types which are very useful for all kinds of applications. I thought it would be nice to produce one of those charts like you get in maps, telling you the distance between cities.

So I start by populating the data. This uses the geography data type, which understands about the spherical nature of the world. The data is populated using POINTS, using data which I found around the internet.

create table dbo.Cities
(id int identity(1,1) primary key,
name char(3) not null,
fullname varchar(100) not null,
location geography);

insert into dbo.cities values
(‘ADL’,’Adelaide’, ‘POINT(-34.92 138.58)’),
(‘MEL’,’Melbourne’, ‘POINT(-37.78 144.97)’),
(‘SYD’,’Sydney’, ‘POINT(-34 151)’),
(‘PER’,’Perth’, ‘POINT(-31.95 115.87)’),
(‘BRI’,’Brisbane’, ‘POINT(-27.48 153.13)’),
(‘CAN’,’Canberra’, ‘POINT(-35.3 149.12)’),
(‘HOB’,’Hobart’, ‘POINT(-42.87 147.32)’)
;

Now that this is populated, a simple query will produce the results that can be put into a report. STDistance is a method which applies to one geography value, accepting another as a parameter.

select c1.name city1, c2.name city2, c1.location.STDistance(c2.location) / 1000 distance
from dbo.Cities c1
    join dbo.Cities c2
    on c2.id <= c1.id
;

distancesNow, I can make a simple matrix in SSRS, using =IIF(Fields!city1.Value = Fields!city2.Value, Fields!city1.Value, Sum(Fields!distance.Value)) as the value in the centre of the matrix, with city1 and city2 on the rows and columns. This formula means that through the main diagonal of the matrix, the city names will be displayed. Hiding the header rows and columns, we quite quickly achieve something that looks like the image here. Of course, if you’re doing this for a proper application instead of just a proof-of-concept, you should take the time to format it more nicely. Colours, borders, that type of thing. But you get the idea.

I’m not sure how useful this is to your own applications – but it’s so easy to query this kind of spatial data, I encourage you to start considering it to be a useful part of your toolset.

This week, a tour of SQL Server User Groups in Australia starts, covering the spatial features in SQL Server 2008. The list of the meetings are at http://www.sqlserver.org.au, and includes all seven groups.