SharePoint 2010 – ConnectionTimeout issues from WFE to Profile DB

Often (at least when working with SharePoint) you can come across something that just stumps your style and leave you wishing the bad man will go away. I’ve just had such a problem and of course figured I’d project my issues onto you.

Problem – getting this evil error screen in SharePoint 2010.

Coupled with a “ConnectionTimeout” error in the ULS and Event log…

PowerShell_ISE.exe | Database | EventID 880i
– System.Data.SqlClient.SqlException: Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding

– at Microsoft.Office.Server.Data.SqlSession.ExecuteReader(SqlCommand command, CommandBehavior behavior, SqlQueryData monitoringData, Boolean retryForDeadLock) 

– SqlError: ‘Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.’    Source: ‘.Net SqlClient Data Provider’ Number: -2 State: 0 Class: 11 Procedure: ” LineNumber: 0 Server: ‘xxx.xxx.xx.xx’

– ConnectionString: ‘Data Source=xxx.xxx.xx.xx;Initial Catalog=xxx_xxx_profileDb_xxx;Integrated Security=True;Enlist=False;Asynchronous Processing=False;Connect Timeout=15’    ConnectionState: Closed ConnectionTimeout: 15

– ProfileEnumerator.PopulateQueue() Exception: System.Data.SqlClient.SqlException: Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.   

Scenario:

Multi-server environment (WFE, App/CA, SQL)…
Multi-domain (Claims) environment, using ADFS and local ADDS….

The DMZ is split into (currently, and yes, we are moving SQL into a third zone) 2 zones…

The problem we were seeing was “ConnectionTimeout” errors from the WFE in Zone 1, to the Profile DB (User Profile Service/sync) in Zone 2 each time we were attempting to retrieve a list of profiles via the UserProfileManager.

In order to see what was causing this (obviously we were seeing the error in the ULS and in the Event logs, but we couldn’t see what exactly was causing it) we had to manually retrieve the profiles..the choice was powershell of course.

[void][System.Reflection.Assembly]::LoadWithPartialName(“Microsoft.Office.Server”)
[void][System.Reflection.Assembly]::LoadWithPartialName(“Microsoft.Office.Server.UserProfiles”)
[void][System.Reflection.Assembly]::LoadWithPartialName(“Microsoft.SharePoint”)

$site = new-object Microsoft.SharePoint.SPSite(“https://xxx.xxxx.com“);
$ServiceContext = [Microsoft.SharePoint.SPServiceContext]::GetContext($site);
$ProfileManager = new-object Microsoft.Office.Server.UserProfiles.UserProfileManager($ServiceContext)
try {
$AllProfiles = $ProfileManager.GetEnumerator()

If (!($?)) {Throw ” – Facepalm!”}

foreach($profile in $AllProfiles)
{
    $DisplayName = $profile.DisplayName
    $AccountName = $profile[[Microsoft.Office.Server.UserProfiles.PropertyConstants]::AccountName].Value
    $workEmail = $profile[[Microsoft.Office.Server.UserProfiles.PropertyConstants]::workEmail].Value
    write-host $DisplayName, “;”, $AccountName, “;” , $workEmail, “;”
}

write-host “Finished.”
}
catch {
    Write-Warning ” – $($_.Exception.Message)”
    throw “more facepalm!”

}
$site.Dispose()

Ran this from the WFE and voila – we got the ConnectionTimeout error here as well.

Next step was for us to capture the network packages going across the network, because a simple powershell script which just so happen query the same database that the UserProfileManager was connecting to, resulted in a successful query, bringing back the profiles that we were looking for. Ok, so it’s not a port blocked causing this.

After talking to a bright network engineer/architect, we were asked to run WireShark on the WFE and capture the traffic. We did, and the engineer was nice enough to read through the massive lump of data that this had generated. Breaking down the traffic made the engineer see that we had a lot of reverse DNS lookup requests simply not being responded to. Ok, that made sense, because the DNS first off all doesn’t have it configured (internet facing DMZ w/ADDS + DNS) and the “broadcast” from the WFE in order to find the SQL Server (that was the ConnectionTimeout btw) would have been blocked by firewall so the SQL Server would never have seen the request…so, hence…ConnectionTimeout.

A simple way to fix this (it’s not ideal of course, but without getting the reverse DNS lookup setup there really isn’t much choice. Reverse DNS Lookup has it’s own security implications that for this scenario really isn’t ideal) is to add the SQL Server details into the hosts file on the WFE server.

Naturally, once the 3rd zone is created and the SQL Server moved into that zone, this “fix” has to be done for both the WFE and the CA/App server.