Windows Server 2012 AD Cloning, Snapshot Support & Preventing USN Rollbacks

Updated 4/16/2014

Preamble

Virtualization is a valuable asset for many organizations, including cloud computing. However, there were some drawbacks that many administrators weren’t aware of when implementing a Hyper-V infrastructure.

For example, there are ramifications of cloning servers without Sysprepping the base image first. Sysprep generate a new SID (the unique Security Identifier each machine has) upon first-time boot up. There are also ramifications with the Virtual host time service, which provides time to the virtual guests, but if the guests are part of an AD infrastructure, or if the guests are DCs, then the host time synchronization service will cause problems with the default AD forest time hierarchy, due to Kerberos’ five minute skew tolerance. Easy enough, it’s recommended to disable time synchronization on the host to prevent this from occurring.

One of the more important ramifications, which we will discuss in the section, involves virtualized snapshots and domain controllers and using the Revert feature to roll back a virtual machine to a previous point in time using a previously saved snapshot. The ramifications can effectively make a DC useless.

In this blog, we’ll talk about:

  • What is a Snapshot?
  • What is the USN?
  • What is a USN Rollback?
  • Windows Server 2012 Snapshot Support
  • Windows Server 2012 Cloning Support

What is a Snapshot?

Hyper-V provides the ability to create a point-in-time copy of a virtual guest. The point-in-time copy is called a snapshot. The snapshot can be used to “revert” the virtual guest back to the point-in-time the snapshot was created.

Snapshots are a convenient means to return a virtual machine to a previous state, such as to return to a state prior to installing an application that is no longer behaving properly.

What is the USN?

The USN, or Update Sequence Number, is the basis of how Active Directory Replication works.

The USN is a value stored with each attribute that changes by either a local change, or a replicated change from a partner domain controller. Each domain controller keeps track of its own changes, and other domain controllers in the infrastructure are aware of all other domain controller USN value.

Active Directory replication relies on Update Sequence Numbers (USNs) on each domain controller. The USN acts as a counter. Each DC’s USN value is unique to a domain controller. The replication system is designed with this restriction in mind.

When an inbound replication partner domain controller sees its partner has a higher USN value for any attribute, a replication pull request is made to replicate the changes to the partner.

Active Directory Replication does not depend on or use time displacement or a time stamp to determine what changes need to be propagated. Time based propagation as some directory services use, are based on a time stamp with the “last writer wins” rule, however this can pose a problem if the clock were to be rolled back.

A time stamp is used in Active Directory, but it’s only used to determine and resolve a conflict when an attribute has been modified at two different DCs simultaneously. In this case, the DC receiving the update will use one of three values to resolve a conflict:

  1. The Version number that is incremented on an attribute by the original writer
  2. The originating time of the original writer
  3. The originating DSA value, which is the GUID of the domain controller (found in ADSI Edit and in the _msdcs.contoso.com DNS zone).

And because these USN counters are local to each DC, it ensures and is determined that the USN to be reliable by replication partner DCs, because the local DC keeps track of all its own changes.

The USN can never “run backward” (decrease in value). If it does, replication partner DCs will recognize the decreased value, and determine it as an inconsistency, and will remove the DC from its own replica set. This is called a USN Rollback. Although they can be repaired, in many cases, it’s easier and more time efficient to simply force remove the DC with the USN Rollback, and re-promote it back into the domain.

You can use Ldp.exe or ADSI Edit to read the current USN, which is the highestCommittedUsn attribute that can be found on the RootDSE object properties for the domain controller.

Up-to-Dateness, High-Watermark, Propagation Dampening, InvocationID

Replication takes into account specific values and follows a pre-defined algorithm to insure replication consistency among domain controllers to reduce or eliminate divergence, such as the following:

Up-To-Dateness vector

  • This is a value that the destination domain controller maintains for tracking the originating updates that are received from all source domain controllers.
    • This value helps the source DC filter irrelevant attributes (and entire objects if all attributes are filtered) on the basis of the relationships between all sources of originating updates and a single destination.
    • To see the Up-to-datenes vector value, run the repadmin /showvector command.

High-watermark

  • This is a value that the destination domain controller maintains to keep track of the most recent change that it has received from a specific source domain controller for an object in a specific directory partition.
    • This value prevents irrelevant objects from being considered by the source domain controller with respect to a single destination.
    • To see the value of the High-watermark, run repadmin /showreps /verbose and look for each line that starts “USNs:”. The high-watermark USN is the number that is followed by “/OU”.

Propagation Dampening

Fault tolerance is helpful by installing multiple DCs, and provides multiple replication paths between them to reduce latency; however, you might expect the same replication change to be replicated in an endless loop. The Up-to-dateness vector eliminates this possibility along with the InvocationID. The InvocationID of a domain controller and its USN combined provides a unique identifier in the forest associated with every write-transaction performed on each domain controller.

Replication example

To understand the consequences of snapshots prior to Windows Server 2012 requires a brief explanation and basic understanding of how Active Directory replication works.

Scenario: Single Domain, single AD Site, three DCs. DC-A, DC-B, and DC-C, all are replication partners between each other.

Replication Steps

  • DC-A updates a password. The USN is set to 3.
  • DC-B detects a USN change on DC-A
  • DC-B requests the change from DC-A
  • DC-B sends its high-watermark and up-to-dateness vector to DC-A

—–> DC-A looks at the high-watermark and up-to-dateness vector values, and the object that was changed, (the password attribute).
—–> DC-A sees that the originating DSA for the password change is DC-A (itself).
—–> DC-A reads the up-to-dateness vector from DC-B and finds that DC-B is guaranteed to be Up-To-Date from the change from DC-A (itself), but has a USN value of 2.
—–> DC-A sees that the originating USN is 3 on that password attribute.

  • Based on the fact 3 is greater than 2, DC-A sends the changed password to DC-B.

Summary

In summary, propagation dampening occurs if DC-B already received the changed password from DC-C, which received it from DC-A, therefore, DC-B will not request the changed password from DC-A.

Additional reading, and summarized from:
Tracking Updates (Active Directory Replication)
http://technet.microsoft.com/en-us/library/cc961798.aspx

 

Pre-Windows 2012 Virtualized DC recommendations

  • Do not take snapshots or revert back to a snapshot of a domain controller virtual machine.
  • Do not copy the domain controller VHD file.
  • Do not export the virtual machine that is running a domain controller.
  • Do not restore a domain controller or attempt to roll back the contents of an Active Directory database by any other means than a supported backup solution.

 

Undetected USN Rollback

From: Running Domain Controllers in Hyper-V
http://technet.microsoft.com/en-us/library/d2cae85b-41ac-497f-8cd1-5fbaa6740ffe(v=ws.10)#usn_and_usn_rollback

 

Detected USN Rollback

From: Introduction to Active Directory Domain Services (AD DS) Virtualization (Level 100)
http://technet.microsoft.com/en-us/library/hh831734.aspx

 

Repairing USN Rollbacks

To repair a USN Rollback may be difficult. You can use the replication monitoring and diagnostic tools to determine the extent of the damage. If severe where the USN Rollback is undetected, such as when the VHD file attached to a different virtual host is copied and run on another virtual host, which will make it extremely difficult to determine the cause due to duplicate DC SID numbers, besides the rollback, or if the USN on a restored DC has increased past the last USN that the other domain controller has received. In this case, the USN values of the originating DC are different than what the replication partner believes they should be.

The easiest way to repair a USN rollback is to force remove the domain controller that was reverted, run a metadata cleanup to remove the domain controller’s reference from the AD database, and re-promote it.

Reverting back to a snapshot can cause ramifications with other types of services. For one, you must keep in mind of the secure channel that is used by Active Directory members to communicate to the domain. The secured channel uses a password that gets renewed every seven days. For example, if you revert the machine back prior to the point with a previous password, it may no longer be able to communicate. To repair such a scenario, you can reset the machine account, or disjoin it then rejoin it back to the domain. For servers, such as a Microsoft Exchange server, the implications can be much deeper. Besides the secured channel, users will lose any emails that were received between the current time and snapshot time.

 

Windows Server 2012 Snapshot Support Prevents USN Rollbacks

Until the introduction of Windows Server 2012, cloning, snapshotting, or copying, are unsupported. The only supported method to repair a DC is to potentially either using Windows Backup, or a third party backup that supports non-Authoritative or Authoritative restores, or simply force demote and rebuild the DC from scratch and promote it back into the domain. Otherwise, as we’ve discussed, snapshots and cloning have serious ramifications that can result in USN rollbacks or lingering objects, just to name a few.

Windows Server 2012 now supports DC cloning and snapshot restore of domain controllers. The requirements to support the new feature are:

  • Hypervisor that supports VM-GenerationID. Window Server 2012 Hyper-V supports VM-GenerationID. If using a third party Hypervisor, check with the vendor if their latest version supports this feature.
  • The source virtual domain controller must be running Windows Server 2012.
  • A Windows Server 2012 PDC Emulator FSMO Role must be running and available for the cloned DC.

 

How does the VM-GenerationID work?

When you promote a domain controller in a supported Hypervisor, AD DS stores the VM-GenerationID (msDS-GenerationID attribute) in the DC’s computer object in the Ad database. This attribute will now be tracked by a Windows driver in the virtual machine.

If you revert to a snapshot, the driver looks at the current VM-GenerationID value and compares it to the value in the AD database on its computer object. The comparison also occurs each time a DC is rebooted.

If the VM-GenerationID are different:

  • The InvocationID is reset
  • The RID pool is deleted
  • The new value is updated in the AD database, thus preventing any possibility of the USN values to be re-used.
  • A non-authoritative SYSVOL synchronization occurs to safely restore and re-initialize SYSVOL (to prevent a JRNL-WRAP error).
  • Each time a DC is rebooted, the value is compared, and if they are different, this rule and action applies.
  • These actions also safeguards shutdown virtual DCs.

If the VM-GenerationID are the same:

  • The snapshot and transaction is committed.

 

Windows Server 2012 Cloning

In Windows Server 2012, administrators no longer need to use Sysprep to clone a machine, promote it to a domain controller, then complete any additional tasks such as Windows Updates, or install organization standard applications. After the first domain controller is freshly installed from scratch or using Sysprep in a domain, Administrators can now safely deploy cloned domain controllers by simply copying an existing virtual domain controller.

This feature is domain specific. A domain must have at least one DC installed that can be copied. You still want to properly configure DNS settings, validate each DC’s health, replication status, and run the Active Directory Best Practice Analyzer after each Dc deployment.

This feature provides the following advantageous and benefits:

  • Rapid DC deployment
  • Quick restores
  • Optimize private cloud deployments
  • Rapid DC provisioning to quickly meet increased capacity needs

What if I Don’t Want the VM-Generation ID Mechanism to Kick In?

Perhaps there’s a time when you don’t want this protection, such as if you are trying to clone your environment to a lab. If you follow the rules, the VM-Generation ID will protect the USN and probably not give you what you want, and worse, if the DCs you’re trying to clone are having trouble replicating SYSVOL, you have more problems to deal with.

One way around it to prevent the VM-Generation ID to kick in at the hypervisor level is to shut down the VMs, and simply do a flat file copy to another hypervisor, then create a new VM from using the existing files.That should help the attribute mechanism from kicking in. More info on this and other thoughts:

Cases where VM-GenerationID doesn’t help make Active Directory virtualization-safe -Part 1
http://blogs.dirteam.com/blogs/sanderberkouwer/archive/2013/08/28/cases-where-vm-generationid-doesn-t-help-make-active-directory-virtualization-safe-part-1.aspx

Why Windows Server 2012 AD VM-Generation ID functionality is not an alias for Active Directory anti-USN Rollback functionality
http://blog.joeware.net/2013/02/20/2675/

*

Additional Reading:

Tracking Updates (USN & Active Directory Replication)
http://technet.microsoft.com/en-us/library/cc961798.aspx

Running Domain Controllers in Hyper-V
http://technet.microsoft.com/en-us/library/d2cae85b-41ac-497f-8cd1-5fbaa6740ffe(v=ws.10)#usn_and_usn_rollback

How to detect and recover from a USN rollback in Windows Server 2003, Windows Server 2008, and Windows Server 2008 R2
http://support.microsoft.com/kb/875495

Steps for deploying a clone virtualized domain controller
http://technet.microsoft.com/en-us/library/hh831734.aspx#steps_deploy_vdc

Virtual Domain Controller Cloning in Windows Server 2012
http://blogs.technet.com/b/askpfeplat/archive/2012/10/01/virtual-domain-controller-cloning-in-windows-server-2012.aspx

By Ace Fekay

MCT, MVP, MCSE 2012/Cloud, MCITP EA, MCTS Windows 2008/R2, Exchange 2007 & 2010, Exchange 2010 Enterprise Administrator, MCSE 2003/2000, MCSA Messaging 2003
  Microsoft Certified Trainer
  Microsoft MVP: Directory Services
  Active Directory, Exchange and Windows Infrastructure Engineer

Comments are welcomed.

How to Recover a Journal Wrap Error (JRNL_WRAP_ERROR) and a Corrupted FRS SYSVOL from a Good DC – What option do I use, D4 or D2? What’s the Difference between D4 and D2?

Original: 11/21/2013
Updated 8/30/2014

Errata

Ace here again. I’ working on updating all of my blogs. If you see any inconsistencies, please let email me and let me know.

Prologue

Are you seeing Event ID 13508, 13568, and anything else related to SYSVOL, JRNL_WRAPS, or NTFRS?

Note – I will not address Event ID 2042 or 1864. That’s an issue with replication not working beyond the AD tombstone. If you are seeing them, you’re best bet is to forcedemote the machine, run a metadata cleanup, and re-promote it, and make sure you configure your firewall and/or AV to allow replication traffic or stop using the ISP’s or router as a DNS address, or disable IP routing and WINS Proxy, to prevent this in the future. And while you’re at it bump up your AD tombstone to 180 days,

As for the NTFRS, after talking to numerous folks whether directly assisting a customer, or through the TechNet forums, there seems to be some confusion associated with how to handle Journal Wrap errors, what caused them, and what are the differences between the D2 and D4 options. I’ll try to quell this confusion in this blog, as well as provide an easy step-step and providing an explanation for the steps, to get out of this error. Note: The steps are from Microsoft KB290762. I just thought to further break it down so a layman will understand them.

Reference KB: Using the BurFlags registry key to reinitialize File Replication Service Replica Sets
http://support.microsoft.com/kb/290762

For Windows 2008/2008 R2/2012/2012 R2 with DFSR

Follow this KB to fix it:

How to force an authoritative and non-authoritative synchronization for DFSR-replicated SYSVOL (like “D4/D2” for FRS)
http://support.microsoft.com/kb/2218556

Backing Up and Restoring an FRS-Replicated SYSVOL Folder
http://msdn.microsoft.com/en-us/library/windows/desktop/cc507518(v=vs.85).aspx 

What Caused the Journal Wrap?

First you have to ask yourself, what caused this error on my DC? What did I do to get here? In a nutshell, JRNL_WRAPS are caused by SYSVOL corruption.

The usual culprit can be a number of things:

  • Abrupt shutdown/restart. I don’t usually see this unless there are power issues in the building with not power protection or UPS battery system.
  • Disk errors – corrupted sectors. This is a common issue with a DC on older hardware.
  • AV not configured to exclude SYSVOL, NTDS and the AD processes. This is the typical culprit I’ve seen in many cases.

Ok, So what do I have to do to fix this?

To get yourself out of this quandary, it’s rather simple. Yea, you might say yea, right, this is not so simple, but it really isn’t that hard. It just requires a little understanding of what you have to do, which is all it’s doing is simply copying a good SYSVOL folder and subfolders from a good DC to the bad DC (the one with the errors.

Basically, you first choose which DC is the good DC to be your “source” DC for the SYSVOL folder. Then you you stop the NTFRS service on all DCs. Yes, NTFRS must to be stopped on all DCs to perform this. Then set the registry key on the good DC and the bad DC. That’s it. The process will take care of itself and reset the keys back to default after it’s done.

  • If you only have one DC, such as an SBS server, and SYSVOL  appears ok, or restore just the SYSVOL from a backup. Then just follow the “Specific” steps I’ve outlined below.
  • If more than one DC, but not that many where you can’t shutdown the NTFRS on all of them, such as if you have 40 DCs, pick and choose the best one and set Burflags to D2 on the bad and D4 on the good.
  • If there are numerous DCs, such as a large infrastructure, simply run dcpromo /forcedemote the DC with the error, run a metadata cleanup, then re-promote to a DC back into the domain. If you unplug the DC and run a metadata cleanup, then you will have to rebuild the DC from scratch. The forcedemote switch removes the AD binaries off the machine allowing you to re-promote it.

 

To summarize:

You have two choices as to a restore from a good DC using FRS:

  1. D2 is set on the bad DC: Non-Authoritative restore: Use the D2 option on the DC with the empty SYSVOL folder, or the SYSVOL folder with the incorrect data. This way it will get a copy of the current SYSVOL and other folders from the good DC that you set the BurFlags D4 option on.
  2. D4 is set on the good DC: Authoritative restore: Use the BurFlags D4 option on the DC that has a copy of the current policies and scripts folder (a good, not corrupted folder).

 

The BurFlags option – D4 or D2? What do I use?

The steps refer to changing a registry setting called the BurFlags value. If the BurFlags key does not exist, simply create it. It’s a DWORD key.

More importantly, it references change the BurFlags to one of two options: D4 or D2. Therefore, before going further, I would like to squelch the confusion on what the D2 and D4 settings mean:

D2/D4 – Which is which?

  • D2, also known as a Nonauthoritative mode restore – this gets set on the DC with the bad or corrupted SYSVOL
  • D4, also known as an Authoritative mode restore – use this on the DC with the good copy of SYSVOL.
  • You must shut the NTFRS service down on ALL DCs while you’re doing this until instructed to start it.
  • You’ll probably want to copy the current SYSVOL structure on the good DC to another folder as a backup prior to doing this.

The D2 option on the bad DC will do two things:

  1. Copies the current stuff in the SYSVOL folder and puts it in a folder called “Pre-existing.” That folder is exactly what it says it is, it is your current data. This way if you have to revert back to it, you can use the data in this folder.
  2. Then it replicates (copies) good data from the GOOD DC (D4) to the bad guy (D2).

Once again, simply put:

  • The BurFlags D4 setting is “the Source DC” that you want to copy its good SYSVOL folder from, to the bad DC.
  • The bad DC BurFlags is set to D2, which tells it to pull from the source DC, the one you set D4 on.

 

Here are the steps summarized:

  1. For an Authoritative Restore you must stop the NTFRS services on all of your DCs
  2. In the registry location: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process
    1. Set the BurFlags setting to HEX “D4” on a known DC that has a good SYSVOL (or at this time restore SYSVOL data from backup then set the Burflag to D4)
    2. Then start NTFRS on this  server.
    3. You may want to rename the old folders with .old extensions prior to restoring good data.
  3. Clean up the folders on all the remaining servers (Policies, Scripts, etc) – renamed them with .old extensions.
  4. Set the BurFlags to D2 on all remaining servers and then start NTFRS.
  5. Wait for FRS to replicate.
  6. Clean up the .old stuff if things look good.
  7. If the “D4” won’t solve the problem try the “D2” value.

 

So circling back, to fix this and make it work, just copy the contents of SYSVOL to another location, then follow the KB, which simply states you must stop the NTFR service on ALL DCs. Then pick a good one to be the “Source DC.”

Of course, as I’ve stated above, if you have a large number of DCs, the best bet is to forcedemote the bad DC, run a metadata cleanup to remove its reference from AD, then re-promote it.

If you have a small number of DCs, and if you have a good DC and a bad DC, on the good DC, you would set the BurFlags to D4, and on the BAD DC you would set the Burflags to D2.

Example run:

In the example below, if you set BurFlags to D4 on a single domain controller and set BurFlags to D2 on all other domain controllers in that domain, you can rebuild the SYSVOL from the D4 DC (the source DC).

I’ve also heard of admins manually copying the SYSVOL folder, then set the BurFlags options as mentioned, which works too. But no, I haven’t tested it. That would be for a lab on another day. 🙂

Authoritative Restore Example

Use the BurFlags D4 option on the DC that has a copy of the current policies and scripts folder (a good, not corrupted folder).

  1. Stop the FRS service on all DCs. To do this to all DCs from one DC, you can download PSEXEC and run “psexec \\otherDC net stop ntfrs” one at a time for each DC.
  2. On a good DC that you want to be the source, run regedit and go to the following key:
    HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process at Startup
    In the right pane, double-click “BurFlags.” (or Rt-click, Edit DWORD)
       Type D4 and then click OK.
  3. On the bad DC, run regedit and go to the following key:   HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process at Startup
       In the right pane, double-click “BurFlags.” (or Rt-click, Edit DWORD)
       Type D2 and then click OK.
  4. Quit Registry Editor, and then switch to the Command Prompt (which you still have opened).
  5. On the good DC, start the FRS service, or in a command prompt, type in “net start ntfrs” and hit <enter>
  6. On the bad DC, start the FRS service, or in a command prompt, type in “net start ntfrs” and hit <enter>
  7. On the bad DC, check the Sysvol folder to see if it started populating.
  8. Check for EventID 13565 which shows the process started
  9. Check for EventID 13516, which shows it’s complete
  10. Start FRS on the other DCs.

The following occurs after running the steps above after you start the FRS service (NTFRS):

  • The value for BurFlags registry key returns to 0.
  • Files in the reinitialized FRS folders are moved to a <var>Pre-existing</var> folder.
  • An event 13565 is logged to signal that a nonauthoritative restore is started.
  • The FRS database is rebuilt.
  • The member replicates (copies) the SYSVOL folder from the GOOD DC.
  • The reinitialized computer runs a full replication of the affected replica sets when the relevant replication schedule begins.
  • When the process is complete, an event 13516 is logged to signal that FRS is operational. If the event is not logged, there is a problem with the FRS configuration.
     
    Note: The placement of files in the <var>Pre-existing</var> folder on reinitialized members is a safeguard in FRS designed to prevent accidental data loss. You can copy this stuff back if it didn’t work, but I have not yet seen when this has not worked!

Summary

I hope this helps cleaning up your FRS and SYSVOL replication issues.

Ace Fekay
MVP, MCT, MCSE 2012, MCITP EA & MCTS Windows 2008/R2, Exchange 2013, 2010 EA & 2007, MCSE & MCSA 2003/2000, MCSA Messaging 2003
Microsoft Certified Trainer
Microsoft MVP – Directory Services
Complete List of Technical Blogs: http://www.delawarecountycomputerconsulting.com/technicalblogs.php

This blog is provided AS-IS with no warranties or guarantees and confers no rights.

DNS Zone Types Explained, and their Significance in Active Directory

==================================================================
==================================================================
Ace Fekay, MCT, MVP, MCSE 2012/Cloud, MCITP EA, MCTS Windows 2008/R2, Exchange 2007 & 2010, Exchange 2010 Enterprise Administrator, MCSE 2003/2000, MCSA Messaging 2003
   Microsoft Certified Trainer
   Microsoft MVP: Directory Services
   Active Directory, Exchange and Windows Infrastructure Engineer and Janitor

Revisions

Original publication 4/30/2013

Prelude

Ace here again. I thought to touch base on DNS zones, and more so, focus on what AD integrated zones are and how they work. This blog almost mimics my class lecture on this topic. Check back for updates periodically, which I will notate with a timestamp above with whatever I’ve added or modified.

This topic was also briefly discussed in the following Microsoft Technet forum thread:
Technet thread: “Secondary Zones?”
http://social.technet.microsoft.com/Forums/en-US/winserverNIS/thread/c1b0f3ac-c8af-4f4e-a5bc-23d034c85400

 

AD Integrated Zones AD Database Storage Locations

First up is a background on the various parts of the Active Directory database and what gets stored in them. This will help understand where DNS data is stored as I discuss it later in this blog.

The Active Directory Data Store (the AD database):

There are three possible storage locations for DNS zone storage in the Active Directory database:

  • DomainNC – This was the only available location with Windows 2000. This replicates to all DCs only in a specific domain.
  • DomainDnsZones partition – Introduced in Windows 2003 and used in all newer operating systems. This replicates to all DCs only in a specific domain in the forest.
  • ForestDnsZones partition. This replicates to all DCs in the forest.

You can see how not all partitions are replicated forest wide. It depends on the partition:

 

Ok, Now the DNS Basics:

  • A Secondary is a read-only copy
  • A Secondary zone stores it’s data in a text file (by default in the system32\dns folder)
  • A Secondary gets a copy of the zone data from the Primary
  • A Primary is the writeable copy
  • A Primary stores it’s zone data in a text file (by default in the system32\dns folder)
  • There can only be one Primary, but as many Secondary zones as you want.
  • You must allow zone transfer capabilities from the Primary zone if you want to create a Secondary.
  • AD integrated zones do NOT need zone transfers to be allowed (see below for specifics)

Active directory Integrated Zones changes this a bit:

AD Integrated zones are similar to Primary zones, however their data is stored as binary data in the actual AD database and not as a text file. The specific place in the AD database depends on the DC’s operating system version and replication scope, which means what “logical” part of the physical AD database it’s stored in, which will affect which DCs in the forest it will replicate to.

  • The “only one Primary Zone” rule is changed by introducing the Multi-Master Primary feature. This is because the data is not stored as a text file, rather it is stored in the actual, physical AD database (in one of 3 difference logical locations or what we call the Replication Scope), and any DC that has DNS installed (based on the replication scope) will be a writeable copy.
  • The zone data is replicated to other DCs in the replication scope where the data is stored (based on one of the 3 logical locations)
  • Each DC in the replication scope that has DNS installed, will automatically make available the zone data in DNS
  • Each DC that hosts the zone can “write” to the zone, and the changes get replicated to other DCs in the replication scope of the zone/
  • The DC that makes a change becomes the SOA at that point in time, until another DC makes a change to the zone, then it becomes the SOA
  • An AD Integrated zone can be configured to allow zone transfers to a Secondary, but the Secondary CANNOT be a DC in the same replication scope as the zone you are trying to create as a Secondary, otherwise the DC you are attempting to create the Secondary on will automatically change it to AD integrated, since it “sees” it in the AD database. In some cases, if this is forced or done incorrectly, it can lead to duplicate or conflicting zones in the AD database, which is problematic until fixed.

And if you install DNS on another DC, the zone data will *automatically* appear because DNS will recognize the data in the AD database. AD integrated zones can also act as a Primary zone for secondary zones, whether they are on Windows machines, BIND (on Unix) or any other name brand.

Remember, AD integrated zones still follow the RFCs, but have more features.

 

Duplicate or Conflicting zones?

Since I touched based on duplicate and conflicting zones, you may want to check if they exist in your AD database. You have to check each partition, and if you have more than one domain, you have to check the DomainDnsZones and DomainNC of each domain. You may even have to check it on multiple DCs in various AD Sites to see if they all “see” the same copy or different copies. You would be surprised what I’ve seen with AD replication problems and seeing different DCs “seeing” something different in its own database. This issue also manifests as a symptom in more than just a DNS problem, where you create a user on one DC and it never replicates to another DC.

Using ADSI Edit to Resolve Conflicting or Duplicate AD Integrated DNS zones
http://msmvps.com/blogs/acefekay/archive/2009/09/02/using-adsi-edit-to-resolve-conflicting-or-duplicate-ad-integrated-dns-zones.aspx

 

Primary Standard Zone, Secondary Standard Zones & Zone Transfers

Zone transfers allow you to create a read only copy (a Secondary zone) on another DNS server, that will pull a copy (transfers) from the read/writable zone (the Primary zone).

Primary and Secondary zones store their data as text files.

On a Windows machine, the zone files can be found in the \system32\dns folder with a file name such as “domain.com.dns”. You can have numerous read only copies, but there can only be one read/write of that zone.

Please keep in mind, the authoritative DNS server listed in the registrar for a public domain name (zone) does not have to be a Primary, it’s just the host nameserver listed as authoritative. It can get it’s data from a Primary that is not listed, hence the writable copy is actually hidden and protected from public access.

Do I need Zone transfers Allowed for AD Integrated Zones if I do not have Secondaries Zones?

The short answer: NOPE.

The reason is that the term “AD Integrated” means the zone is stored in the AD database, and the zone will replicate to other domain controllers within the same replication scope (domain-wide or forest-wide) automatically as part of the AD replication process.

By default, AD integrated zones are configured to not allow zone transfers.

Allowing zone transfers is an option provided to support non-DC DNS servers, BIND or any other name brand DNS server that you want to allow zone transfers to a secondary on those servers.

Rotating SOA

Additional security options of AD integrated zones, is one of the feature of AD integrated zones, as well as the fact that there can be more than one Primary zone copy of it. This is because all DNS servers that host the zone in a domain or forest has the ability to be a writable copies and becomes the actual “start of authority” (SOA) of that zone when a specific DC/DNS accepts a write operation, such as a client machine registering, or the DC itself updating its SRV records.

For example, if a DC updates it’s SRV and other records at the default 60 minute interval (all other machines register every 24 hours), it will update its data into the DNS server listed as the first DNS address in the network card. This server now writes it into DNS and NOW becomes the SOA of the zone. That data is replicated to other DC/DNS servers with default AD replication. Now all other DC/DNS servers will see the change.

To further explain, since the zone is AD integrated, each and every DC in the replication scope of the zone, can accept changes, due to an AD integrated zone’s Multi-Master Primary Zone features. Based on the definition of what an SOA is, that is being the DNS server that’s authoritative to accept writes, therefore, whichever DC/DNS accepted a change to the zone, that specific DC/DNS will become the SOA for that moment in time. Then when the next DC/DNS that accepts a change, it will now become the new SOA. The SOA constantly changing in an AD environment is accepted, and default behavior.

That is why you can watch the SOA name on AD integrated zones change. The data is replicated automatically as part of the AD replication process because it is stored in the AD database.

Active Directory-integrated DNS zone serial number behavior (SOA default behavior) 
http://support.microsoft.com/kb/282826 

 

References

Configure AD Integrated Zones
(When converting to AD integrated zones)
Quoted: “Only primary zones can be stored in the directory. If a zone is configured on other domain controllers as a secondary zone, these zones will be converted to primary zones when you convert the zone to AD integrated. This is because the multimaster replication model of Active Directory removes the need for secondary zones when a zone is stored in Active Directory. Conversion of the zone from secondary to primary will occur when AD DS is restarted.”
 http://technet.microsoft.com/en-us/library/ee649181(v=ws.10)

Understanding DNS Zones
http://www.tech-faq.com/understanding-dns-zones.html

Understanding stub zones: Domain Name System(DNS)
Jan 21, 2005 – The master servers for a stub zone are one or more DNS servers authoritative for the child zone, usually the DNS server hosting the primary …
http://technet.microsoft.com/en-us/library/cc779197(v=ws.10).aspx

AD Site Design and Auto Site Link Bridging, or Bridge All Site Links (BASL)

By Ace Fekay, MCT, MVP, MCSE 2012/Cloud, MCITP EA, MCTS Windows 2008/R2, Exchange 2007, 2010 & 2013, Exchange 2013, Exchange 2010 Enterprise Administrator, MCSE 2003/2000, MCSA Messaging 2003
  Microsoft Certified Trainer
  Microsoft MVP: Directory Services
  Active Directory, Exchange and Windows Infrastructure Engineer

Updated 12/12/2013

Preface

Ace here again with something I really would like to discuss, since this topic comes up from time to time.

To properly designed an AD multi-site infrastructure, there are a few things that need to be taken into account. I won’t bore you with all the background techno babble, rather I’m going to discuss a no-nonsense, get down to business on why you need to either keep Auto Site Link Bridging enabled, or why you need to disable it, both of which depends on your physical routed topology design.

AD Sites

First, a basic understanding of Active Directory Sites is important to understand before I go further.

Some of the biggest questions I hear about AD Sites are:

  • What are AD Sites?
  • What are AD Sites for?
  • Why can’t I create an AD Site without a domain controller in the Site?

These are all valid questions. A little research will usually result in an answer, but you may have to dig through piles of technical details to get to it. Let’s address each one:

What are AD Sites?

An AD Site defines a highly-connected, physical network locations in Active Directory. We define them by IP subnet or subnets. And yes, you can have multiple subnets that are highly-connected by routers within a location. In some cases, for example, if you have a very high-speed backbone, such as an OC-1 (51.84Mbps or higher), between locations, you can put all those subnets in one AD Site. However, in many cases, we probably don’t want to do that. Hang in there, I’ll be getting to that in a few minutes.

What are AD Sites for?

AD sites are basically used for two things:

  1. To facilitate service localization. In simple English, this means to control logon and authentication traffic to DCs in a specific location, or Site. After all, we don’t want a client in NYC to pick a DC in Seattle, Japan, or somewhere else, to send its logon or authentication request (such as when accessing a folder), do we? Nope. The client side DC Locator process will find a DC in its own Site by using the client’s IP address.
  2. To manage DC replication traffic. In simple English, this means to control DC replication traffic across WAN links between the bridgeheads (the DCs in a Site that communicate with DCs in other Sites). By default, replication between Sites are compressed down to 15% of total traffic. And with Sites, we can control frequency of replication, and when we’re allowing it to happen.’

Why can’t I create an AD Site without a DC in the Site?

Good question. If you look at what AD Sites are for, then it should be pretty obvious that you need a DC in it. After all, if there is no DC in it, and a client picks a Site based on its IP subnet, then looks for a DC, it won’t find any, will it? Nope, so it may wind up randomly picking a DC in another location, such as that DC in Seattle or Japan.

DC Locator Process

I don’t want to dwell on this, but I will briefly mention it because this is part of the reason why we want to create AD Sites anyway.

There is a process that a client uses to pick a DC. Here’s a quick view of how a client picks a DC. I should add a #9 to the list in a scenario when no DC exists in the Site, then it uses Automatic Site Coverage, and this ONLY if you created an IP Site link to another Site that you want the DCs to cover that site.

If you didn’t create an IP Site link for a Site that has no DCs, then it will pretty much become a random process, sort of, by using other factors, such as subnet netmask ordering and Round Robin. If you want to read up on this subject, here are two good TechNet Forum discussions on it:

Briefly, here are the DC Locator process steps, and these steps were directly quoted from How Domain Controllers Are Located in Windows XP

  1. Client does a DNS search for DC’s in _LDAP._TCP.dc._msdcs.domainname
  2. DNS server returns list of DC’s.
  3. Client sends an LDAP ping to a DC asking for the site it is in based on the clients IP address (IP address ONLY! The client’s subnet is NOT known to the DC).
  4. DC returns…
    1. The client’s site or the site that’s associated with the subnet that most matches the client’s IP (determined by comparing just the client’s IP to the subnet-to-site table Netlogon builds at startup).
    2. The site that the current domain controller is in.
    3. A flag (DSClosestFlag=0 or 1) that indicates if the current DC is in the site closest to the client.
  5. The client decides whether to use the current DC or to look for a closer option.
    1. Client uses the current DC if it’s in the client’s site or in the site closest to the client as indicated by DSClosestFlag reported by the DC.
    2. If DSClosestFlag indicates the current DC is not the closest, the client does a site specific DNS query to: _LDAP._TCP.sitename._sites.domainname (_LDAP or whatever service you happen to be looking for) and uses a returned domain controller.

Brief overview:

For a full-sized image, click on the images.

Let me point out again, that if there are no DCs in a Site, then Automatic Site Coverage will take over.

To me, it’s a process to “find” a DC that will authenticate a user in a Site without a DC. However, my take on it is I would rather associate the location’s subnet to a current Site so as to not make the client go through this process. Besides, there may be scenarios that not having a DC in a Site can directly affect directory enabled applications and services such as DFS site referrals, SCCM or Exchange with it’s high dependency on GCs and DSAccess.

Here’s the DC Locator process, directly quoted from the Technet article, “How DNS Support for Active Directory Works:”

  1. Build a list of target sites — sites that have no domain controllers for this domain (the domain of the current domain controller).
  2. Build a list of candidate sites — sites that have domain controllers for this domain.
  3. For every target site, follow these steps:
    1. Build a list of candidate sites of which this domain is a member. (If none, do nothing.)
    2. Of these, build a list of sites that have the lowest site link cost to the target site. (If none, do nothing.)
    3. If more than one, break ties (reduce this list to one candidate site) by choosing the site with the largest number of domain controllers.
    4. If more than one, break ties by choosing the site that is first alphabetically.
    5. Register target-site-specific SRV records for the domain controllers for this domain in the selected site.

If there are no DCs in a Site, you can use PowerShell to figure out which DC in which Site will be picked. If you like, you can further read up on the commands used to figure this out in Sean Ivey’s blog:

Sites Sites Everywhere…, By Sean Ivey, Microsoft DS PFE
http://blogs.technet.com/b/askds/archive/2011/04/29/sites-sites-everywhere.aspx

So wouldn’t you want your clients to pick a DC in its own Site?

Moving forward, do we really want a client to pick a DC in some other site or go through the Automatic Site Coverage process? Would you want that? I ‘m sure you already know the answer to that.

Therefore, if you have a location that have no DCs, then simply create an IP subnet object, and associate the subnet object to an existing AD Site that you want those users to use. In this case, you may base your own pick on a site linked by the fastest WAN link, or the only WAN link.

And if there are any subnets that are not associated with an AD site, then any DC is game to authenticate a client, as seen in the process above. To check for clients which subnets are not configured to AD Sites & Services, among other things, enable Netlogon logging, and check the system32\config\netlogon.log file. Here’s more info:

Enabling debug logging for the Net Logon service, Last Review: May 3, 2011 – Revision: 11.0, Applies to: all operating systems.
 http://support.microsoft.com/kb/109626

Auto Site Link Bridging

This now brings us to bridging, what it is, etc.

Within an AD Site, the KCC (Knowledge Consistency Checker) will automatically assume that all DCs can directly reach each other, and create Intrasite replication partnerships between the DCs in the Site. The one point that I want to be clear about that no matter how many DCs are in a Site, and there can be hundreds of DCs in a Site, the KCC will make sure that the  partnerships created are done so that all DCs in a Site will have an updated replication set for any changes by any of the DCs in the site, within 15 minutes. If you add a new DC to the Site, the KCC jumps in and evaluates the new guy and adds it so it gets updated data from other DCs under 15 minutes. How does it do that? It follows a set algorithm, but that is beyond this discussion.

When there are multiple Sites, and more specifically three or more Sites, and keeping in mind that by default AD automatically assumes that all the Sites have direct physically connectivity and communications between each other. This means you can literally ping a DC from in any Site to any other Site.

Here’s where the ISTG (Intersite Topology Generator) kicks in. The ISTG is a component of the KCC. It evaluates the overall topology, and builds connection objects between servers in each of the sites to enable Intersite replication— DC replication between sites.

Here’s a fully routed infrastructure, For the full-sized image, click here.

 

If remote sites cannot directly communicate with each other and only to the hub site

However, if your physical network topology is designed where each site does not have direct communications with each other, and you leave all the default “Auto Site Link Bridge” setting enabled as is, then lots of things will go wrong, such as replication problems, duplicate AD integrated zones, and more … keep reading. But I won’t address duplicate zones. You can click the link in the previous sentence for more on that.

If the network topology was a hub and spoke and BASL wasn’t disabled and individual sites links between the hub and each site weren’t created until recently, then there may be replication problems. This is a whole different subject. What I can say, besides checking to see if there are duplicate zones, as I mentioned in the previous paragraph, I would also run the Active Directory Replication Status Tool to check replication status. It will provide a report, and anything amiss will show up in Red. Pretty cool tool. Download it here:

Download The Active Directory Replication Status Tool (ADREPLSTATUS):
   http://www.microsoft.com/en-us/download/details.aspx?id=30005
     Note: This tool requires .Net Framework 4. If it’s not installed, download and install it:
       Microsoft .NET Framework 4 (Web Installer)
       http://www.microsoft.com/en-us/download/details.aspx?id=17851

Remember, by default, the KCC assumes all sites can directly communicate, therefore it will create partnerships between Bridgeheads in all sites. And any DC in a site can automatically become a bridgehead.

So if corporate headquarters is in NYC, and you have three remote locations, Miami, Chicago and Seattle, and direct communications does not exist, meaning that each remote location can only communicate with headquarters, and IP routing has not been configured between the remote locations, and the KCC creates a connection object (partnership) between a DC in Miami and a DC in Seattle, what will happen?

Since they can’t directly communicate, then replication fails. And if the Seattle partnership is the only connection object Miami may have, but Seattle happens to have one to NYC, and keeping in mind, replication is a PULL request, then Seattle will receive replication from NYC, but Miami can’t pull anything from Seattle, because there is no direct or indirect communications. So Miami winds up being in a secluded island.

In Miami’s DC’s view, it thinks no one wants to talk to it, so it will complain (you will see multiple event log errors) that others having replicated with it. And according to the DCs in the other sites, they will all think the same thing about Miami.

So who’s right? Of course, they all are. If the lack of replication goes beyond the AD Tombstone, then Miami would need to be demoted. Then again, you can’t even do that because it doesn’t have direct communications with its partner. Then if it does pick a DC in headquarters to demote, you will see an error stating that the headquarters DC already thinks the DC no longer exists. In the case of trying to demote it, or even forcedemoting it beyond the Tombstone, then your only option is to unplug it, run a metadata cleanup and re-promote it. But wait, then the same thing will occur if you don’t disable BASL.

So is it right that we do the same thing over and over and expect different results? Nope. Let’s configure AD to make sure it will not happen again, by disabling BASL.

Here’s a non-fully routed infrastructure. For the full-sized image, click here.

Disable BASL

Simply put, what we need to do is disable BASL (Bridge All Site Links) in a non-fully routed infrastructure to tell the KCC to only partner DCs across a specific site link.

Yes, that means you also have to create specific IP site links between headquarters in NYC to each site, as the image above shows.

And even if you have 20 sites all fully routed EXCEPT for one of them, then the same thing goes. You must disable it all because of that one site, otherwise the KCC will partner with a DC that it may not have direct communications with.

How to disable BASL. For the full-sized image, click here.

Summary

If you want to make sure your AD infrastructure is properly purring along and doing its job, then by all means let’s design it properly, make the necessary modifications, and other changes, to get it going in the right direction.

Oh, and you can’t forget to bone up on your DNS knowledge and how it supports AD. All Sites get registered in DNS by the netlogon service. Read more:

How DNS Support for Active Directory Works
http://technet.microsoft.com/en-us/library/cc759550(WS.10).aspx

And to understand the DNS SRV records registered by a DC’s Netlogon service, read Sean Dubey’s blog, with DNS SRV records examples, once again, I refer you to Sean Ivey’s blog:

Sites Sites Everywhere…, By Sean Ivey, Microsoft DS PFE
http://blogs.technet.com/b/askds/archive/2011/04/29/sites-sites-everywhere.aspx

References

Designing the Site Topology
http://technet.microsoft.com/en-us/library/cc787284(WS.10).aspx

Detailed branch office deployment guide (downloadable doc)
http://www.microsoft.com/downloads/details.aspx?FamilyId=9353A4F6-A8A8-40BB-9FA7-3A95C9540112&displaylang=en

Best Practice Active Directory Design for Managing Windows Networks
http://technet.microsoft.com/en-us/library/bb727085.aspx

You may want to take a look at the design IPD guide (Infrastructure Planning and Design) for AD – Download Details: IPD guide for Active Directory Domain Services – version 1.0
http://www.microsoft.com/en-us/download/details.aspx?id=732

Download the complete Infrastructure Planning and Design (IPD) Guide Series v2.0 including links for AD IPD, SCCM IPD, and more.
http://technet.microsoft.com/en-us/library/cc196387.aspx

Comments & Corrections are welcomed.

Ace Fekay