Exchange DAG Cluster Service Terminated with Error 7024

I ran into an interesting issue at a client site yesterday on an Exchange 2010 SP1 DAG member.  One DAG member’s databases would not mount (even Public Folders) and the Cluster service would not start.  The DAG is configured as a three member stretched DAG, with two nodes in the main site and another in the DR site.



The event IDs logged were:



Log Name:      System
Source:        Service Control Manager
Date:          10/23/2011 12:07:44 AM
Event ID:      7024
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      EX03.domain.local
Description:
The Cluster Service service terminated with service-specific error Log service encountered an invalid log block..



Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          10/23/2011 12:00:43 AM
Event ID:      1177
Task Category: Quorum Manager
Level:         Critical
Keywords:     
User:          SYSTEM
Computer:      EX03.domain.local
Description:
The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk.
Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.



Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          10/23/2011 12:00:43 AM
Event ID:      1135
Task Category: Node Mgr
Level:         Critical
Keywords:     
User:          SYSTEM
Computer:      EX03.domain.local
Description:
Cluster node ‘EX02′ was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.



Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          10/23/2011 12:00:43 AM
Event ID:      1135
Task Category: Node Mgr
Level:         Critical
Keywords:     
User:          SYSTEM
Computer:      EX03.domain.local
Description:
Cluster node ‘EX01′ was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

The cluster failed when two of the three DAG members (EX01 and EX02) went offline at 12:00:43 AM due to a network failure in the active site.  For some reason, this corrupted the CLUSDB.BLF file on the member in the DR site which prevented that node from coming online when the network came back up.  CLUSDB.BLF is a CLFS Base Log File used by the cluster service which contains metadata that is used to manage access to the log data. 



To correct the problem, navigate to the %WINDIR%\Cluster folder and rename CLUSDB.BLF to CLUSDB.BLF.OLD.  Then restart the Exchange server.  The cluster service will generate a new CLUSDB.BLF file on restart.  The cluster service will be able to start and the databases will mount.