Active Directory Lingering Objects, Journal Wraps, USN Rollbacks, Tombstone Lifetime, and Event IDs 13568, 13508, 1388, 1988, 2042, 2023, 2095, 1113, 1115, 2103, and more …

Active Directory Lingering Objects, Journal Wraps, USN Rollbacks, Tombstone Lifetime, and Event IDs 13568, 13508, 1388, 1988, 2042, 2023, 2023, 2095, 1113, 1115, 2103, and more …


Ace Fekay, MCT, MVP, MCITP EA, Exchange 2010 Enterprise Administrator, MCTS Windows 2008, Exchange 2010 & Exchange 2007, MCSE 2003/2000, MCSA Messaging 2003
Microsoft Certified Trainer
Microsoft MVP: Directory Services


Posted 12/27/2011
Updated 1/27/2012 – Clarified some of the Event ID 2042 repair steps
Updated 3/27/2012 – Added info about and how to recover from USN Rollbacks




Journal Wrap – What does it mean?


To summarize, a Journal Wrap indicates it’s trying to replicate to another DC and the bad DC’ FRS service may have been shut off for some reason. The Wrap error is based on the USN log or known as the USN Journal. Everything and anything that gets replicated has a USN, or Update Serial Number. Each DC has it’s own, and other DCs keep track of them so they know whether they have the other DCs’ latest changes and are up to date on their own end. So generally, the USN Journal keeps track of changes made to any NTFR drive, whether for DFS, DC replication of SYSVOL, etc. If changes are made while the FRS service is shut down, it may get to a point where the last time something was changed, and when the FRS service is started, the last USN it’s aware of no longer exists (because that much time has passed by).


DCs will also protect themselves against Lingering Objects in 2 ways:
(1) By implementing strict replication
(2) By isolating DCs that have NOT replicated with other DCs for more than the tombstone lifetime


To fix it, see “Fixing Journal Wraps” below in this blog.


 


Reference:


Troubleshooting journal_wrap errors on Sysvol and DFS replica sets
http://support.microsoft.com/?id=292438


 



Lingering objects


Lingering Objects occur if a domain controller will remain offline exceeding the Active Directory Tombstone Lifetime and thereby may retaining objects that have been permanently deleted from the directory on all other domain controllers in the domain and replication will be out of synch. The old data on the DC that hasn’t replicated are the Lingering Objects.


If a DC is reintroduced past its tombstoned period (it’s point of no return), it can cause directory inconsistency and, under certain conditions, these objects can be reintroduced into the directory. Hence Lingering Objects.


Also, to determine which DC has the lingering object, if there are more than one DC, and all the DCs except one show a Lingering Object error, then the one that does not have an event ID showing a lingering object error, is the one with the lingering object that the other DCs are rejecting.


 


To fix this, see the “Fixing Lingering Objects” section below in this blog.


Good thread regarding the AD Tombstone and Lingering Objects:
Technet Forum: DC offline for 2 months, best way to handle?
http://social.technet.microsoft.com/Forums/en-US/winserverDS/thread/8c74df53-8042-423c-a801-7a7f38fdde7f



Example Event ID 2042:


Event Type:Error
Event Source:NTDS Replication
Event Category:Replication
Event ID:2042
Date:3/22/2005
Time:7:28:49 AM
User:NT AUTHORITY\ANONYMOUS LOGON
Computer:DC3
Description:
It has been too long since this machine last replicated with the
named source machine. The time between replications with this source
has exceeded the tombstone lifetime. Replication has been stopped
with this source.
The reason that replication is not allowed to continue is that
the two machine’s views of deleted objects may now be different.
The source machine may still have copies of objects that have
been deleted (and garbage collected) on this machine. If they
were allowed to replicate, the source machine might return
objects which have already been deleted.
Time of last successful replication:
2005-01-21 07:16:03
Invocation ID of source:
0397f6c8-f6b8-0397-0100-000000000000
Name of source:
4a8717eb-8e58-456c-995a-c92e4add7e8e._msdcs.contoso.com
Tombstone lifetime (days):
60


The replication operation has failed.


 


 



Other Event IDs associated with Lingering Objects and Journal Wraps:


2042
2023
1398
1988
1864
13568
NTFRS
NTDS
Or similar replication related errors.


You maybe are able to get replication running again, see below about Event ID 2042. However, if you can’t get replication running again, you have to remove the outdated DC from the domain. If the original DC has other services installed, such as Exchange, this will complicate matters. (See section below about Exchange on a DC).



References:


Event ID 1388 or 1988 A lingering object is detected Active Directory:
http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/Operations/77dbd146-f265-4d64-bdac-605ecbf1035f.mspx


Event ID 2042: It has been too long since this machine replicated:
http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/Operations/34c15446-b47f-4d51-8e4a-c14527060f90.mspx


Event ID 2042: It has been too long since this machine replicated
http://technet.microsoft.com/en-us/library/cc757610(WS.10).aspx


The “Allow Replication With Divergent and Corrupt Partner” setting has to be set on all DCs.
Fixing Replication Lingering Object Problems (Event IDs 1388, 1988, 2042)
http://technet.microsoft.com/en-us/library/cc949124(WS.10).aspx


Event ID 1388 or 1988: A lingering object is detected
http://technet.microsoft.com/en-us/library/cc780362(WS.10).aspx


 




Why or how did this occur?


It could have been for a number of reasons:


  • Using the wrong DNS servers such as an external DNS, such as your ISP’s DNS server
  • Using your router as a DNS address – Note: your router is not a DNS server
  • Firewall blocks between the DCs, whether a perimeter firewall, firewall ruls on the VPN tunnels
  • Antivirus software – many new antivirus sport a “network traffic protect” feature that act like a firewall that may block replication and other communication traffic.
  • Security software blocking necessary traffic
  • Windows Firewall not properly configured.
  • Duplicate DNS zones
  • MTU settings altered below 1500 on the VPN tunnel endpoints

First, you have to fix or address the above.


Second, you have one of two choices:


1. If you have one or two DCs, you’ll need to go through the process of edting the reg to force Journal Wrap restore, let it run, then turn it off. Both links supply the steps, with the second one right on the first page.
2. If you have numerous DCs, or having difficult with this DC you may want to simply demote or force demote, seize FSMOs, run a metadata cleanup, and rebuild a new DC from scratch.


 


To make sure your firewall ports are opened, what ports need to be opened, and information on using PortQry to check if the ports are opened, listening or allowed, see the following:


Active Directory Firewall Ports – Let’s Try To Make This Simple
http://msmvps.com/blogs/acefekay/archive/2011/11/01/active-directory-firewall-ports-let-s-try-to-make-this-simple.aspx


 To check if you have any Duplicate AD Integrated DNS zones in the AD database:


Using ADSI Edit to Resolve Conflicting or Duplicate AD Integrated DNS zones
http://msmvps.com/blogs/acefekay/archive/2009/09/02/using-adsi-edit-to-resolve-conflicting-or-duplicate-ad-integrated-dns-zones.aspx



Active Directory Tombstone Lifetime


The tombstone lifetime is listed in the schema.ini and will be set during the promotion of the first DC in the forest. The entry in the schema.ini “tombstoneLifetime=<number of days>”


Therefore, the AD Tombstone Lifetime settings depends on the OS version used to initially created the first domain in your new forest years ago. The value will not change from the original installation. This setting will carry on from the original installation, even if you’ve migrated/updated all your DCs to the latest Windows versions and have updated the Forest and Domain Functional Levels. The AD Tombstone setting will not change from the original Forest implemenatation. It must be changed manually.


Here’s the breakdown on what your Tombstone Lifetime settings may be:
- Windows 2000 with all SPs = 60 Days
- Windows Server 2003 without SP = 60 Days
- Windows Server 2003 SP1 = 180 Days
- Windows Server 2003 R2 SP1, installed with both R2 disks = 60 Days
- Windows Server 2003 R2 SP1, installed with the 1st R2 disk = 180 Days
- Windows Server 2003 SP2 = 180 Days
- Windows Server 2003 R2 SP2 = 180 Days
- Windows Server 2008 = 180 Days
- Windows Server 2008 R2 = 180 Days


 You can find what you’re current AD Tombstone setting is from one of the following methods. If the result of the query is set to <not set> , then it’s 60 days. Let’s change it to 180 days.


Dsquery * “CN=Directory Service,CN=Windows NT,CN=Services,CN=Configuration,DC=Domain,DC=com” -attr tombstoneLifetime
or
dsquery * “cn=directory service,cn=windows nt,cn=services,cn=configuration,dc=corp,dc=domain,dc=com” –scope base –attr tombstonelifetime



Or you can use ADSI Edit to find and change it


Double-click Configuration
CN=Configuration
ForestRootDomainName
Services
Windows NT
Right-click CN=Directory Service, and then click Propertie
In the Attribute column, click tombstoneLifetime.
Note the value in the Value column. If the value is <not set>, the default value is 60 days.
Change it to 180 days
Close ADSI Edit
Allow replication to occur.



More info:


Adjusting the Tombstone Lifetime, Ulf B. Simon-Weidner’s Blog:
“However, if you want to raise the tombstone lifetime, e.g. from 60 to 180 to match the new default, there’s one scenario which needs to be considered:
“Lets say we have two DCs, DC-Munich and DC-LA (L.A. because that where The Experts Conference will be in April). On DC-Munich we change the tombstoneLifetime from <not set> (=60) to 180. When garbage collection runs on DC-Munich it is bored – it already cleaned up all changes from 60 days ago but we instructed it to keep everything now to 180 days, so the next 120 days garbage collection does not need to do anything.”
http://msmvps.com/blogs/ulfbsimonweidner/archive/2010/02/10/adjusting-the-tombstone-lifetime.aspx


The default tombstone lifetime (TSL) value remains at 60 days instead of increasing to 180 days in Windows Server 2003 R2
http://support.microsoft.com/kb/924890


Excellent blog by Jorge that applies to Windows 2000/Windows 2003 SP1 and prior versions:


If upgrading from Windows 2000 to 2003 SP1 or prior, the Schema will not get updated with the 180 day Tombstone. You will have to do it manually. It was fixed in SP2.
Conclusion:
 •If you install a W2K3 server from the first CD from the W2K3 R2 distribution set, then promote it to a DC and then install the R2 binaries from the second CD, the tombstone lifetime is set to 180 days
 •If you install a W2K3 server from the first CD from the W2K3 R2 distribution set, then install the R2 binaries from the second CD and then promote it to a DC, the tombstone lifetime is set to <not set> which is 60 days!
or simply put… a BUG after installing the R2 binaries, but before promoting the first DC to create the AD forest.
The solution: manually (or through a script or a command line tool) change the value yourself to 180 for the attribute mentioned earlier.
http://blogs.dirteam.com/blogs/jorge/archive/2006/07/23/1233.aspx


 




Event IDs possibily associated with Journal Wraps:


Most common one is:
EventID 13568


Event Type: Error
Event Source: NtFrs
Event Category: None
Event ID: 13568
Date: Whenever
Time: Whenever
User: N/A
Computer: computername
Description:
The File Replication Service has detected that the replica set “DOMAIN
SYSTEM VOLUME (SYSVOL SHARE)” is in JRNL_WRAP_ERROR.



An Event ID 13568 usually means there is a corruption with the FRS data in the shared folder that used by NTFRS between DC for replication, or the DC has been disconnected longer than the tombstone lifetime (tombstone time varies based on operating systems), or something else occured, such as a hiccup with a NIC driver, power surge, etc.


Bascially, it’s saying you’ll need to go through the process of edting the reg to force Journal Wrap restore, let it run, then turn it off. Both links supply the steps, with the second one right on the first page.


To fix it, see the “Fixing a Journal Wrap” section below in this blog.



References:


EventID 13568
http://eventid.net/display.asp?eventid=13568&eventno=1743&source=NtFrs&phase=1


EventID 13568 and Journal Wrap Error
http://www.petri.co.il/forums/showthread.php?t=7122


http://eventid.net/display.asp?eventid=13568&eventno=1743&source=NtFrs&phase=1


Thread: “Jrnl_wrap_error”
http://www.petri.co.il/forums/showthread.php?t=7122


 




Event ID 13508:


Example:


Source FRS
Type Error
Description The File Replication Service is having trouble enabling replication from <server> to <server> for <path> using the DNS name <name>. FRS will keep retrying.
Following are some of the reasons you would see this warning.


[1] FRS can not correctly resolve the DNS name <name> from this computer.
[2] FRS is not running on <name>.
[3] The topology information in the Active Directory for this replica has not yet replicated to all the Domain Controllers.


 


This is indicative of FRS replication problems. Check FRS event logs on both computers. If Event ID 13508 is present, there may be a problem with the RPC service on either computer
http://support.microsoft.com/kb/272279


To fix it, you’ll need to set the Burflag options to kick it off again (see “Fixing a Journal Wrap” below), that is as long as all ports between all locations and have been confirmed wide opened, and no local antivirus or security software is blocking necessary network traffic to and from the machine, and as long as this is under 180 days. Any longer, the DC would have to be forced demoted.


The high level steps are basically to first make sure that the FRS service is running. Then to kick off replication and rebuild the Sysvol, you’ll want to set the Burflag value on the good DC to D4, then set the Burflag value to D2 on ALL other DCs in that domain. Please read the detailed steps in the following link. Scroll down to the section titled, “How to rebuild the domain SYSVOL replica set across enterprise environments” in the following link:


How to rebuild the SYSVOL tree and its content in a domain.
http://support.microsoft.com/kb/315457


More info here:
Using the BurFlags registry key to reinitialize File Replication
http://support.microsoft.com/kb/290762 


Troubleshooting journal_wrap errors on Sysvol and DFS replica sets ibn Windows 2000 (EOL)
http://support.microsoft.com/?id=292438


 


 




Is Exchange or any other high demand service installed on the DC?



Exchange, SQL, CRM, Sharepoint, etc, will highly complicate matters to recover the machine. It’s highly suggested to not intall anything on a DC other than DNS, DHCP or WINS. Installing any other service or high level app will complicate or vastly affect DC operations, especially making recoverability highly complex, especially if you are at the point that the DC is not recoverable. This will vastly impact Exchange, because Exchange will “lock” on to the GC service on the DC it is installed on and will not look elsewhere for a GC, even if you have a numerous GCs.


Read more on Exchange or any other app on a DC and it’s impact on the DC and the impact on Exchange or whatever is installed on the DC:


Exchange on a Domain Controller – Ramifications and How to Move Exchange off a DC
http://msmvps.com/blogs/acefekay/archive/2009/08/08/moving-from-exchange-2000-currently-on-a-windows-2000-domain-controller-to-a-new-exchange-2003-server-on-a-windows-2003-member-server.aspx


 




Fixing a Journal Wrap


The following will help to kick off replication and rebuild SYSVOL to get it out of a Journal Wrap state.


Assocated Event ID: 13568, 13508


Note:
If it’s the only DC in the network then set Burflags to D4 (also known as an authoritative mode restore) to rebuild it from scratch.
If there are more than one DC, you will want to set Burflags do D2 (also known as a nonauthoritative mode restore) to pull a copy from an existing DC.


Keep in mind, the use of the Burflags key to fix Journal Wrap Errors instead of “Enable Journal Wrap Automatic Restore” also prevents you from seeing a an empty SYSVOL.


Using “Enable Journal Wrap Automatic Restore” will make NTFRS reinitialize all NTFRS shares and delete all contents in those shares. However, this is a very aggressive and destructive approach and you may lose data, such as any logon scripts, etc.


Instead, just use the “Burflags” key and set it to “D4″ and restart NTFRS. You will see it fix itself and SYSVOL and NETLOGON will still have its contents after the restore.


 


To perform a nonauthoritative restore, stop the FRS service (using the D2 option), configure the BurFlags registry key, and then restart the FRS service. To do so:


1.Click Start, and then click Run.
2.In the Open box, type cmd and then press ENTER.
3.In the Command box, type net stop ntfrs.
4.Click Start, and then click Run.
5.In the Open box, type regedit and then press ENTER.
6.Locate the following subkey in the registry:


HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process at Startup


7.In the right pane, double-click BurFlags.
8.In the Edit DWORD Value dialog box, type D2 and then click OK.
9.Quit Registry Editor, and then switch to the Command box.
10.In the Command box, type net start ntfrs.
11.Quit the Command box.


When the FRS service restarts, the following actions occur:


• The value for BurFlags registry key returns to 0.
• Files in the reinitialized FRS folders are moved to a Pre-existing folder.
• An event 13565 is logged to signal that a nonauthoritative restore is started.
• The FRS database is rebuilt.
• The member performs an initial join of the replica set from an upstream partner or from the computer that is specified in the Replica Set Parent registry key if a parent has been specified for SYSVOL replica sets.
• The reinitialized computer runs a full replication of the affected replica sets when the relevant replication schedule begins.
• When the process is complete, an event 13516 is logged to signal that FRS is operational. If the event is not logged, there is a problem with the FRS configuration.


 



To complete an authoritative restore (using the D4 option), stop the FRS service, configure the BurFlag registry key, and then restart the FRS service. To do so:


1.Click Start, and then click Run.
2.In the Open box, type cmd and then press ENTER.
3.In the Command box, type net stop ntfrs.
4.Click Start, and then click Run.
5.In the Open box, type regedit and then press ENTER.
6.Locate the following subkey in the registry:


HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process at Startup


7.In the right pane, double click BurFlags.
8.In the Edit DWORD Value dialog box, type D4 and then click OK.
9.Quit Registry Editor, and then switch to the Command box.
10.In the Command box, type net start ntfrs.
11.Quit the Command box.


When the FRS service is restarted, the following actions occur:


• The value for the BurFlags registry key is set back to 0.
• An event 13566 is logged to signal that an authoritative restore is started.
• Files in the reinitialized FRS replicated directories remain unchanged and become authoritative on direct replication. Additionally, the files become indirect replication partners through transitive replication.
• The FRS database is rebuilt based on current file inventory.
• When the process is complete, an event 13516 is logged to signal that FRS is operational. If the event is not logged, there is a problem with the FRS configuration.


 


References:


More specifics, see:
Using the BurFlags registry key to reinitialize File Replication Service replica sets:
http://support.microsoft.com/kb/290762


How to rebuild the SYSVOL tree and its content in a domain.
http://support.microsoft.com/kb/315457


Kicking NTFRS to start replicating after SYSVOL non-auth. restore
http://blogs.dirteam.com/blogs/jorge/archive/2006/05/12/Kicking-NTFRS-to-start-replicating-after-SYSVOL-non_2D00_auth.-restore.aspx


 


 




Fixing Lingering Objects:


Associated Event ID: 2042:


It has been too long since this machine replicated
Lingering Objects
EventID 2042
EventID 1388
EventID 1988
EventID 13508



To reinitialize replication due to lingering objects, which is due to replication failing far beyond the Tombstone AD limit.


Below is from:
Event ID 2042: It has been too long since this machine replicated
http://technet.microsoft.com/en-us/library/cc757610(WS.10).aspx


========================================================================
An example of an Event ID 2042:


Event Type:Error
Event Source:NTDS Replication
Event Category:Replication
Event ID:2042
Date:3/22/2005
Time:7:28:49 AM
User:NT AUTHORITY\ANONYMOUS LOGON
Computer:DC3
Description:
It has been too long since this machine last replicated with the
named source machine. The time between replications with this source
has exceeded the tombstone lifetime. Replication has been stopped
with this source.
The reason that replication is not allowed to continue is that
the two machine’s views of deleted objects may now be different.
The source machine may still have copies of objects that have
been deleted (and garbage collected) on this machine. If they
were allowed to replicate, the source machine might return
objects which have already been deleted.
Time of last successful replication:
2005-01-21 07:16:03
Invocation ID of source:
0397f6c8-f6b8-0397-0100-000000000000
Name of source:
4a8717eb-8e58-456c-995a-c92e4add7e8e._msdcs.contoso.com
Tombstone lifetime (days):
60
The replication operation has failed.


User Action:


Determine which of the two machines was disconnected from the
forest and is now out of date. You have three options:


1. Demote or reinstall the machine(s) that were disconnected.
2. Use the “repadmin /removelingeringobjects” tool to remove
inconsistent deleted objects and then resume replication.
3. Resume replication. Inconsistent deleted objects may be introduced.
You can continue replication by using the following registry key.
Once the systems replicate once, it is recommended that you remove
the key to reinstate the protection.
Registry Key:
HKLM\System\CurrentControlSet\Services\NTDS\Parameters\Allow Replication With Divergent and Corrupt Partner
========================================================================


 


To check which DC has not replicated longer than the tombstone lifetime:


Check each DC for an Event ID 2042,
Then run repadmin /showrepl on this specific DC that shows this error.
The repadmin /showrepl command may also report error 8614 on the DC in question:
=============================================================
Source: Default-First-Site-Name\DC1
******* 1502 CONSECUTIVE FAILURES since 2005-01-21 07:16:00
Last error: 8614 (0x21a6):
            The Active Directory cannot replicate with this server
because the time since the last replication with this server has
exceeded the tombstone lifetime.
=============================================================


Then fun the following procedure to reinitialized replication:


From:
Event ID 1388 or 1988: A lingering object is detected.
http://technet.microsoft.com/en-us/library/cc780362(WS.10).aspx


Then restart replication:


Restart Replication Following an Event ID 2042
To restart inbound replication on the destination domain controller following event ID 2042, you must edit the Allow Replication With Divergent and Corrupt Partner registry entry in
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters.


 


 Steps to kick off replication after an Event ID 2042:


1. Expand “HKLM\System\CurrentControlSet\Services\NtFrs\Parameters”


If the registry entry exists in the details pane, modify the entry as follows:


  • In the details pane, right-click Allow Replication With Divergent and Corrupt Partner, and then click Modify.
  • In the Value data box, type 1, and then click OK.

If the registry entry does not exist, create the entry as follows:


  • Right-click Parameters, click New, and then click DWORD Value.
  • Type the name Allow Replication With Divergent and Corrupt Partner, and then press ENTER.

2. Change value for “Enable Journal Wrap Automatic Restore” from 0 to 1.


  • If the DWORD Value does not exist, create a new one with the exact spelling as above, including spaces but without the quotes.

3. Stop the NTFRS Service (open a command prompt and type “net stop ntfrs”)


4. Start the NTFRS Service (net start ntfrs)


5. Monitor the File Replication Service Event Logs for events:
• 13553 – The DC is performing the recovery process
• 13554 – The DC is ready to pull the replica from another DC.
• 13516 – At this point go to step 6. (the problem is resolved if you receive this event)


6. Using a command prompt type: “net share” and look for the Netlogon and Sysvol Shares to appear. The Journal Wrap error is only fixed after the Domain Controller receives the new SYSVOL replica from a peer Domain Controller. This may take a period of time depending on where your peer DC is located and on bandwidth.


7. Reset the Registry to Protect Against Outdated Replication


When you are satisfied that lingering objects have been removed and replication has occurred successfully from the source domain controller, edit the registry to return the following values to Zero (0):


  • Change the value for “Allow Replication With Divergent and Corrupt Partner” to 0.


Reference:


Event ID 2042: It has been too long since this machine replicated
http://technet.microsoft.com/en-us/library/cc757610(WS.10).aspx


Event ID 13568:
http://www.eventid.net/display.asp?eventid=13568&source=


 


 



If you can’t get replication to reinitialize


Now if it continues after these steps, then you would need to run an Authoritative Restore. Do you have a backup? If not, and nothing else is running on it, and you have other DCs, I would force demote it, then re-promote it back into a DC. Or simply transfer FSMOs, demote it and rebuild it from scratch.


If you have many DCs and this is not possible or feasible:


Simply transfer FSMOs, demote it and rebuilt it from scratch. If you can’t transfer the FSMOs and/or you can’t demote it properly, force demote it using dcpromo /forceremoval, seize any FSMOs, run a metadata cleanup, and rebuilt it from scratch. See the following for more info:


Remove a Current Operational Domain Controller from Active Directory (Includes tranferring FSMO roles, DNS settings, Time settings, WINS settings, etc)
http://msmvps.com/blogs/acefekay/archive/2010/10/09/remove-a-current-operational-domain-controller-from-active-directory.aspx


Complete Step by Step Guideline to Remove an Orphaned Domain controller
http://msmvps.com/blogs/acefekay/archive/2010/10/05/complete-step-by-step-to-remove-an-orphaned-domain-controller.aspx


 .


.


USN Rollbacks


USN Rollbacks occur from using a virtualized snapshot (HyperV or VMWare) to recover a DC. Snapshots are not supported, for obvious reasons.


The following is quoted from KB875495:


Bascially, a USN Rollback is a  ” […] condition that occurs when a domain controller that is running Windows 2000, Windows Server 2003, Windows Server 2008, or Windows Server 2008 R2 starts from an Active Directory database that has been incorrectly restored or copied into place. This condition is known as an update sequence number rollback, or USN rollback.

When a USN rollback occurs, modifications to objects and attributes that occur on one domain controller do not replicate to other domain controllers in the forest. Because replication partners believe that they have an up-to-date copy of the Active Directory database, monitoring and troubleshooting tools such as Repadmin.exe do not report any replication errors.”


.


Associated Event IDs with USN Rollbacks:


Event Source: NTDS Replication
Event Category: Replication
Event ID: 2095


Event Source: NTDS General
Event Category: Replication
Event ID: 1113


Event Source: NTDS General
Event Category: Replication
Event ID: 1115


Event Source: NTDS General
Event Category: Service Control
Event ID: 2103


.


How to fix a USN Rollback?


In summary, The easiest way out of a USN Rollback is to simply unplug the machine, run a metadata cleanup, then re-build it from scratch (do NOT use a cloned image), then promote it back into the domain., You may want to take a look at this:


How to detect and recover from a USN rollback in Windows Server 2003, Windows Server 2008, and Windows Server 2008 R2
Feb 10, 2011 – Explains how to recover when a domain controller is incorrectly rolled back by using an image-based installation of the operating system.
http://support.microsoft.com/kb/875495 


If you opt to “unplug” the DC, as mentioned in the above KB article, then you can follow this step by step:


Complete Step by Step Guideline to Remove an Orphaned Domain controller (including seizing FSMOs, running a metadata cleanup, cleanup DNS, Sites, and more)
http://msmvps.com/blogs/acefekay/archive/2010/10/05/complete-step-by-step-to-remove-an-orphaned-domain-controller.aspx


.


In addition, there is another option to fix a USN Rollback:


If you don’t want to unplug it, metadata cleanup, etc. It’s a method by Paul Bergson. It uses an unsupported method by Microsoft, but it works. You have to dig in and alter the Invocation ID using repadmin, the registry, etc. Tedious, but it works.


Restoring a Virtual DC from a Snapshot, by Paul Bergson
http://blogs.dirteam.com/blogs/paulbergson/archive/2011/01/14/restoring-a-dc-from-a-snapshot.aspx


.


.



General References:



Fixing Replication Lingering Object Problems (Event IDs 1388, 1988, 2042)
http://technet.microsoft.com/en-us/library/cc738018(WS.10).aspx


Event ID 2042: It has been too long since this machine replicated
This shows you how to recover a DC that has not replicated beyond the Tombstone LIfetime
http://technet.microsoft.com/en-us/library/cc757610(WS.10).aspx


Determining the forest Active Directory (and ADAM/ADLDS) Tombstone Lifetime using Joe Richard’s ADFind
http://blog.joeware.net/2010/02/05/1896/


Determine the tombstone lifetime for the forest
http://technet.microsoft.com/en-us/library/cc784932(WS.10).aspx


Reconnecting a Domain Controller After a Long-Term Disconnection
http://technet.microsoft.com/en-us/library/cc786630(WS.10).aspx


What happens when the disconnection of a DC exceeds the Tombstone Lifetime?
http://blogs.dirteam.com/blogs/jorge/archive/2005/11/24/153.aspx


Lingering objects
http://blogs.dirteam.com/blogs/jorge/archive/2006/05/08/Lingering-objects.aspx


Help, I’ve lost my SYSVOL and can’t get it up
http://msmvps.com/blogs/bradley/archive/2007/12/27/help-i-ve-lost-my-sysvol-and-can-t-get-up.aspx


 



======
Related Additional Links


Active Directory Inside Out (5 of 10): DNS Features and Configuration (First Question):
http://www.microsoft.com/technet/community/chats/trans/windowsnet/wnet_111204.mspx


Things to consider when a Windows Server 2003-based domain controller or a Windows 2000-based domain controller runs in a virtual environment (VPC, HyperV or VMWare):
http://support.microsoft.com/?id=888794


What happens when the disconnection of a DC exceeds the Tombstone Lifetime?
http://blogs.dirteam.com/blogs/jorge/archive/2005/11/24/153.aspx


Lingering objects
http://blogs.dirteam.com/blogs/jorge/archive/2006/05/08/Lingering-objects.aspx


Troubleshooting Active Directory Replication Problems
http://technet.microsoft.com/en-us/library/cc738415.aspx


Outdated Active Directory objects generate event ID 1988 in Windows Server 2003
http://support.microsoft.com/kb/870695


Event ID 1388 or 1988: A lingering object is detected
http://technet.microsoft.com/en-us/library/cc780362(WS.10).aspx


Lingering objects may remain after you bring an out-of-date global catalog server back online
http://support.microsoft.com/default.aspx/kb/314282



Fixing Replication DNS Lookup Problems (Event IDs 1925, 2087, 2088)
http://technet2.microsoft.com/WindowsServer/en/Library/43e6f617-fb49-4bb4-8561-53310219f9971033.mspx


Fixing Replication Connectivity Problems (Event ID 1925)
http://technet2.microsoft.com/WindowsServer/en/Library/7fcaa311-bc19-479d-9a4e-179704dfe08f1033.mspx?mfr=true


Fixing Replication Topology Problems (Event ID 1311) ?
http://technet2.microsoft.com/WindowsServer/en/Library/062e8eaa-27e0-4c5e-bc2b-2913ecce24b81033.mspx


 


 


 


Ace Fekay


Corrections, comments and suggestions are welcomed.

DNS on a Read Only Domain Controller (RODC)

DNS on a Read Only Domain Controller (RODC)


Ace Fekay
MCT, MVP, MCITP EA, Exchange 2010 Enterprise Administrator, MCTS Windows 2008, Exchange 2010 & Exchange 2007, MCSE 2003/2000, MCSA Messaging 2003
Microsoft Certified Trainer
Microsoft MVP: Directory Services


Compiled 12/7/2011


 


DNS on an RODC Main Highlights:


    • Changes not allowed on the read-only DNS zone
    • Records cannot be added manually
    • Dynamic updates cannot be made
    • Dynamic updates are “referred” to writeable domain controller
  • DNS updates are handled the same as a Secondary Zone
    • RODC returns to client the SOA and name of a 2008 RWDC, 2008 R2 RWDC, or newer. 
    • If no 2008 or 2008 R2 servers exist in the NS list, a 2003 DC will be chosen, but an Event ID 4015 will be generated when it attempts an RSO with a Windows 2003 DC.
              RODC EventID 4015:
              http://support.microsoft.com/kb/969488 
  • Client will attempt a registration request in the zone
  • If DHCP configured with credentials or DnsUpdateProxy group, then DHCP registers client record into the zone
  • RODC performs a “Replicate Single Object” (RSO) Operation
    • The RODC waits a certain amount of time before it replicates the record from the DNS server that it referred the client to through an RSO operation.
    • Time wait depends on two configurable RSO values for DNS (defaults shown):
      • DsRemoteReplicationDelay 5 sec
      • DSPollingInterval 30 sec
    • Then it attempts to replicate the updated DNS object in Active Directory


 


More specifics regarding the above points:


Appendix A: RODC Technical Reference Topics:
http://technet.microsoft.com/en-us/library/cc754218(WS.10).aspx


 


Summary


Quoted from:
Microsoft Official Curriculum
MOC 6425C, “Configuring and Troubleshooting Windows Server 2008 Active Directory Domain Services”
Module 11, page 11-31:
http://www.microsoft.com/learning/en/us/Course.aspx?ID=6425C 


A DNS server on a Read-Only Domain Controller (RODC) can be authoritative for zones that are replicated to the RODC and can resolve queries for clients that use the RODC as their DNS server.
Of course, a key characteristic of an RODC is that it cannot make changes to Active Directory, so resource records cannot be added manually to the zone on an RODC, and dynamic updates are not accepted from clients.


Dynamic updates are serviced by referring clients to a writeable domain controller when they attempt to send an update to an RODC. It is useful for the RODC to include the client’s updated resource record in the zone as quickly as possible, so the RODC tracks the client that attempted the update, and the writable domain controller to which the client was referred. After a short wait, the RODC performs a replicate single object (RSO) operation in which it retrieves the updated DNS record for the client from the writable domain controller, bypassing standard replication mechanisms.




Event 4015 on an RODC:


RODCs will only replicate updates to itself from a Windows 2008 or newer DC/DNS, which must be in the NS list.


The RODC does not hold a writeable copy of the DNS zone. When the RODC queries for the SOA record, it returns the name of a writable domain controller from the NS list that runs Windows Server 2008 or later and hosts the Active Directory–integrated zone, just as a secondary DNS server handles updates for zones that are not Active Directory–integrated zones.


After it receives the name of a writable domain controller that runs Windows Server 2008 or later, the client is then responsible for performing the DNS record registration against the writeable server. The RODC waits a certain amount of time, as explained below read link below for specific time wait values), and then it attempts to replicate the updated DNS object in Active Directory Domain Services (AD DS) from the DNS server that it referred the client to through an RSO operation.


If a writable 2008 DC is not accessible, the RODC does a query for NS record and picks up “ANY” entry present there. Then the RODC attempts to perform a RSO (ReplicateSingleObject) operation with the selected NS. If the selected entry is a Windows Server 2003, it will return a failure (since Server 2003 doesn’t understand RSO) and an event to be logged on the RODC. 


Therefore, if there are no 2008 or newer RWDCs in the NS list, and the RODC chooses a 2003 DC, then the RODC will generate an Event 4015 when it tries to perform the RSO operation with a DNS server that runs Windows Server 2003.


If there are any non-contactable NS entries, or if you’ve removed nameserver entries say due to the way your WANs are Designed and connected where you wanted to remove any non-contactable DC/DNS servers, and what’s left are 2003 DCs, it will choose one one of them.


To resolve this issue, please deploy DNS Server role on a Writable Domain Controller which is accessible from the RODC. Also ensure that it registers a NS record.


I don’t have any information, nor have I found anything in the registry or otherwise, to alter the selection criteria to make sure it chooses a DC that can be contacted. I’m sure that’s a question for the dev team.


The first link below provides more specifics on the 4015’s and the configurable wait time values (based on the default DsRemoteReplicationDelay 30 sec, and the DSPollingInterval, 3 min), an RODC will wait before it performs an RSO operation for the DNS record update.


Appendix A: RODC Technical Reference Topics
For how DNS Updates work, scroll down to “DNS updates for clients that are located in an RODC site”
http://technet.microsoft.com/en-us/library/cc754218(WS.10).aspx


RODC logs DNS event 4015 every 3 minutes with error code 00002095
http://support.microsoft.com/kb/969488


 


Suggestions and Comments welcomed!
Ace Fekay