Last week I noticed an error in the domain controller event log with Event ID 11:Event Type: Error
Event Source: KDC
Event Category: None
Event ID: 11
Time: 1:43:18 PM
There are multiple accounts with name MSSQLSvc/ComputerName.DomainName.SysName.Company.Local:1433 of type DS_SERVICE_PRINCIPAL_NAME.
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
The name of the computer in the SPN was one of the process control databases I had replaced earlier. Our process control servers each have a specific role, and their computer names are tied to that role. This is a side effect from the process control system software that we use, and I cannot change this.
This means that the old server and the new server will have the same name.
The process software also doesn’t allow me to change domain membership while the software is installed, and if anything goes wrong with installing and configuring the new server, I should be able to bring the old system back online as soon as possible.
So I simply disconnected it, made the new server with the same name a domain member, installed the application, synchronized everything and the system worked fine. I checked the event log of the new server, but couldn’t see anything suspect. However, looking back through the DC system log, I discovered that the problem started at that exact time.
I did a little bit of research, and I found out that the servicePrincipalName attribute basically tells anyone who wants to know that a service with a certain principal name (duh) is running with the credentials of the Active Directory account with which it is registered.
Since a service with a specific ID can only run with 1 account, having duplicates on the network is bad.
Using ldifde, I found out that the service principal name MSSQLSvc/ComputerName.DomainName.SysName.Company.Local:1433 was linked to the user account ‘XyzAdmin’ and the computer account ‘SE-XYZ01’
Using adsiedit.msc, I had to delete one of the SPNs from its containing account. I checked the service on the SE-XYZ01 server, and the SQL server was configured to run as local service. This means that the correct SPN link is to the server account, and not the XyzAdmin account.
Unfortunately I couldn’t check anymore because the old server was already ‘recycled’ but I seemed to remember that the SQL service was configured to run with the XyzAdmin account instead. When I deleted the link, I wrote an entry in the server logbook, writing down exactly what I removed where, and I also saved the deleted info in a text file ‘just in case’.
From the moment I did this the error did not occur anymore, so I deleted the right SPN. Even if it would have been the wrong one, I could have put it back easily with either setspn, or adsiedit.msc.
I think I will make it part of the server replacement procedure to make an ldifde dump before and after, so that I can more easily diagnose possible problems. I also added a line in my daily backup scripts to make a backup of this dump every night for solving problems like this.
Another thing I thought of later was to make a disk image of the old server that needs to be replaced. That way I can uninstall the application, take the computer out of the domain correctly, and install the new server and still know for certain that if anything goes wrong, I can restore the old system to the exact same state from which I started without having to waste any time with the backup and recovery procedures, which can take a long time for certain servers.
As an afterthought I asked my fellow MVPs what the point was of having SPNs in AD in the first place. After all, if a service runs with certain credentials, it will be authenticated when it starts, so what is the added value of registering that information persistent in AD?
This is what MVP Joe Kaplan had to say:Kerberos uses SPNs extensively.
When a Kerberos client uses its TGT to request a service ticket for a specific
service, the service is actually identified by its SPN. The KDC will grant
the client a service ticket that is encrypted in part with a shared secret
that the service account as identified by the AD account that matches the
SPN has (basically the account password).
In the case of a duplicate SPN, what can happen is that the KDC will
generate a service ticket that may be created based on the shared secret of
the wrong account. Then, when the client provides that ticket to the service
during authentication, the service itself cannot decrypt it and the auth
fails. The server will typically log an “AP Modified” error and the client
will see a “wrong principal” error code. I forget the exact error code and
description, but hopefully that’s close enough.
So, duplicate SPNs are very bad, much in the same way that duplicate UPNs
are bad. Both can cause Kerb auth to break and Windows uses Kerb for auth
everywhere it can.