Active Directory Replication Overview & USN Rollback: What It Is & How It Happens

If you have experienced event id #2095, then you understand how a USN Rollback can negatively affect AD consistency.

What is a USN?

The USN (Update Sequence Number) is an Active Directory database instance counter that increments every time a single change is committed to the AD database on a Domain Controller. The USN is unique to each DC and has no correlation to a USN on another DC (and that doesn’t matter, as you will see why later on in this article).  Active Directory replication keeps track of every Domain Controller’s USN and uses this information to determine when replication is required. Active Directory has two basic types of writes to the AD database, a replicated write (where the change is performed on another DC) and an originating write (where the change is performed on the local DC). AD Replicates and leverages the information about what changes were performed on which DCs and then replicated out.

Every Domain Controller has a server object in the site container stored within the Configuration partition. This object has a child object called “NTDS Settings” which has a GUID (Globally Unique IDentifier) attribute which is replicated as part of replication metadata and used by the KCC to build the replication topology. Each DC has its own copy of the Active Directory database stored in the ntds.dit file and this unique database instance on a DC is identified with its own GUID-type identifier called the “Invocation ID”. The Invocation ID is created when the DC is promoted and only changes when the AD database is restored using a supported method or an application partition is added or removed.  The reason for this is so that when an AD database is restored to an earlier point in time, the USN is also restored to that point in time. This means that any change from the restored USN value until the original, pre-restored USN value would be ignored by other DCs pulling replication from the restored DC (since they track other DCs USNs that they replicate with and only pull updates when the destination DC’s USN increments above the last stored update value USN the DC has for it – more on this later). In order to avoid this situation, the DC’s AD database generates a new Invocation ID and stores the old Invocation ID is stored in an attribute on the server’s NTDS Settings object called retiredReplDSASignatures. In this manner, the DCs will treat a new Invocation ID as a new database and ensure it gets updates from it moving forward. You can see a DC’s Invocation ID and Server GUID by running repadmin /showrepl.

Much of the DC information is visible when running the Get-ADDomainController PowerShell commandlet:

PS C:\> import-module activedirectory ;
get-addomaincontroller -identity “ADSecurityORGDC01.ADSecurity.org”

ComputerObjectDN           : CN=ADSECURITYORGDC01,OU=Domain Controllers,DC=ADSecurity,DC=org
DefaultPartition           : DC=ADSecurity,DC=org
Domain                     : ADSecurity.org
Enabled                    : True
Forest                     : ADSecurity.org
HostName                   : ADSecurityORGDC01.ADSecurity.org
InvocationId               : f2e96b87-bb2a-4572-a911-0c79721e7b4f
IPv4Address                : 10.10.10.11
IPv6Address                :
IsGlobalCatalog            : True
IsReadOnly                 : False
LdapPort                   : 389
Name                       : ADSECURITYORGDC01
NTDSSettingsObjectDN       : CN=NTDS Settings,CN=ADSECURITYORGDC01,CN=Servers,CN=WashingtonDC,CN=Sites,CN=Configuration,DC
                             =ADSecurity,DC=org
OperatingSystem            : Windows Server 2008 R2 Enterprise
OperatingSystemHotfix      :
OperatingSystemServicePack : Service Pack 1
OperatingSystemVersion     : 6.1 (7601)
OperationMasterRoles       : {SchemaMaster, DomainNamingMaster, PDCEmulator, RIDMaster…}
Partitions                 : {DC=ForestDnsZones,DC=ADSecurity,DC=org, DC=DomainDnsZones,DC=ADSecurity,DC=org, CN=Schema,CN=Co
nfiguration,DC=ADSecurity,DC=org, CN=Configuration,DC=ADSecurity,DC=org…}
ServerObjectDN             : CN=ADSECURITYORGDC01,CN=Servers,CN=WashingtonDC,CN=Sites,CN=Configuration,DC=ADSecurity,DC=org
ServerObjectGuid           : 214c2f5d-f9eb-4b0c-b1fa-4829fc8be2fc
Site                       : WashingtonDC
SslPort                    : 636

Notice in the Get-ADDomainController ouput above, there is a Computer Object DN and a Server Object DN?

  • The Computer Object DN is where the Domain Controller’s computer object is stored in the domain partition (DC=ADSecurity,DC=org in the example) and represents the DC in the domain.
  • The Server Object DN is representation of the DC as a replication object and is stored in the configuration partition under the site object (…,CN=Sites,CN=Configuration,DC=ADSecurity,DC=org in the example).

Every replicated change includes the following pieces of information:

  • The Replication Source DC’s GUID replicating the change (originating or replicated write)
  • The Replication Source DC’s USN
  • The Originating DC’s GUID (the DC where the AD update was first performed)
  • The Originating DC’s USN

NOTE: If the Originating Source DC’s GUID & the Originating DC’s GUID are the same, then this is an originating write. If they are different, then the change operation is a replicated write and the replication system uses the Up-To-Dateness Vector. The DCs use all of this information inlcuding the Up-To-Dateness Vector & the High Water Mark as part of “Propagation Dampening” to ensure that replication doesn’t get stuck in a loop.

There are two primary mechanism that leverage a DC’s USN to track replication updates:

  1. The Up-to-Dateness-Vector(UTDV) is a method by which DC replication partners can quickly identify when data needs to be replicated.  Each DC has a UDV that keeps track of the USN for every DC hosting a replicated partition. Every DC in the AD forest hosts the same base partitions and replicates these partitions separately: Configuration & Schema and frequently the Domain DNS & Forest DNS application partitions. All of the DCs in the same domain host the same domain partition. Each partition has an associated UDV with all DCs hosting the partition and their associated USN.Simply put, the UTDV is a table of all the DCs for a replicated partition and their replicated USNs and date/time when the DC last replicated. The UTDV is sent to the Source DC by the requesting Destination DC (replicating data is always a pull even though the notification may be a push action) to filter out updates the Destination DC already has. At the end of the replication cycle the Source DC sends the Destination DC its UTDV.Here’s an example of what it looks like:Repadmin: running command /showutdvec against full DC ADSECURITYORGDC02.ADSecurity.org
    Caching GUIDs.
    ..
    WashingtonDC\ADSECURITYORGDC02 (retired) @ USN 65593 @ Time 2012-01-21 12:29:38
    WashingtonDC\ADSECURITYORGDC02 @ USN 62096 @ Time 2012-01-23 21:47:00
    ae7a0f8e-e7bd-4d4f-83df-1c731cdd8e47 @ USN 46734 @ Time 2012-01-21 14:17:26
    WashingtonDC\ADSECURITYORGDC01 @ USN 59160 @ Time 2012-01-23 21:46:33Notice how DC02 has 2 different entries, one with a USN of 65593 (marked Retired) and one with a USN of 62096. AD replication ignores the (Retired) Invocation ID’s USN and uses the current one for replication decisions, though it does keep it in the UTDV to ensure the USNs are kept separate.
  2.  The High Water Mark (HWM) on a DC tracks the highest USN replicated from each DC and uses that to identify when replication is required.Simply put, the HWM tracks the last USN replicated by one of the local DC’s replication partner. The HWM is used to track the most recent changes (objects and attributes) a Destination DC has received from Source DCs for a specific partition.For example, we have DC01 and DC02:DC01’s USN: 3300
    DC02’s USN: 2200DC01 has a HWM of 2000 for DC02 and DC02 has a HWM of 3300 for DC01. When a change is performed on DC01, the USN is incremented to 3301 and when DC02  requests updates, it requests any changes since USN 3300. DC01 sends DC02 change #3301 and DC02 performs a replicated write to its AD database incrementing its local USN to 2201. What’s interesting about this is that at this point, DC01 has a HWM of 200 still and DC02 now has a  USN of 2201. Since the DCs have the information about the originating DC and the originating DC’s USN, they know that this change has already been replicated so they update their HWM and wait for the next change.

What is USN Rollback?

Since updates are replicated when the USN on the source DC is larger than the destination DC has for the source DC (according to the UTDV & HWM), a USN Rollback scenario on a DC prevents replication of AD updates on that DC to any other.  Typically when a USN goes backwards, it is due to a supported restore from backup. When this process occurs, the invocation ID changes. Since all replica partners track replication based on DC GUID, Invocation ID, and USNs, a supported restore method keeps the previous invocation ID as “retired” and effectively ignores it. The new database Invocation ID & associated USN are used to get AD changes from the DC… except when the USN rolls back with NO change in Invocation ID. This means that when a DC is in a state of USN Rollback, AD updates can be performed on that DC with NONE of the changes replicated to its replication partners. That’s bad news.

Examples help clarify concepts, so I’ll use one here.

DC01, DC02 & DC03 are Domain Controllers for the Active Directory domain ADSecurity.org.
DC01 is running on physical hardware and DC02 & DC03 are virtual DCs running on VMWare ESX.

The AD admin copies the DC02 virtual files on Monday and on Wednesday, DC02 fails.
The admin uses the DC02 VM files from Monday to get DC02 up and running on Thursday (ASAP, right?).

During the next replication cycle, DC01 & DC03 send DC02 the last update USN they have for DC02 (USN #31,131). DC02 checks its local USN and notes that the local USN (USN #29,000) is less than the ones sent (USN #31,131), so no update is replicated to DC01 or DC03. As changes to Active Directory are performed on DC02, they are not replicated to the other ADSecurity.org DCs until DC02’s USN increments past the USN the other DCs have for DC02. This means that 2,131 modifications occur on DC02 before the other DCs receive a single update – and then only changes from USN 31,132 on.

Note that in this situation, any change performed on DC02 is NOT replicated to any other DC. This means that modifications to AD on DC02 only exist on DC02 while other DCs may have different versions of the same objects, including passwords and group membership.  See, bad news.

Day 01 – Sunday:
DC01’s USN #34,951
DC02’s USN #28,539
DC03’s USN #22,360

Day 02 – Monday:
DC01’s USN #35,123
DC02’s USN #29,000 (VM copied)
DC03’s USN #23,101

Day 03 – Tuesday:
DC01’s USN #36,432
DC02’s USN #31,131 – FAILS
DC03’s USN #24,555

Day 04 – Wednesday:
DC01’s USN #38,012
DC02 OFFLINE
DC03’s USN #25,010

Day 05 – Thursday:
DC01’s USN #39,951
DC02 Back online with USN #29,000
DC03’s USN #27,360

What causes USN Rollback to occur?

There are several scenarios that Microsoft has identified that can cause USN Rollback on a DC (i.e. unsupported configurations). All of these include the same common theme and that is taking an earlier state of a DC (backup or copy) and running it.

  •  Starting an Active Directory domain controller whose Active Directory database file was restored (copied) into place by using an imaging program such as Norton Ghost.
  • Starting a previously saved virtual hard disk image of a domain controller. The following scenario can cause a USN rollback:
    1. Promote a domain controller in a virtual hosting environment.
    2. Create a snapshot or alternative version of the virtual hosting environment.
    3. Let the domain controller continue to inbound replicate and to outbound replicate.
    4. Start the domain controller image file that you created in step 2.
  • Starting an Active Directory domain controller that is located on a volume where the disk subsystem loads by using previously saved images of the operating system without requiring a system state restoration of Active Directory.

These scenarios pretty much always involve a virtual environment, so if you have virtual DCs you may want to ensure backups are done on the physical DCs (you can never go wrong with performing backups on the FSMOs to start) and/or ensure that the Windows Backup tool is used to backup the AD database (SUPPORTED backup method is key).

Detecting USN Rollback

Detecting USN Rollback is extremely difficult since DCs running Windows 2000 or Windows 2003 RTM didn’t check for repeating USNs for the same Invocation ID. There are no errors in the NTDS Replication event log or when using the replication utility Repadmin. There is a manual method for detecting a USN Rollback using Repadmin and Microsoft KB 875495 describes this process:

One way to detect a USN rollback is to use the Windows Server version of Repadmin.exe to run the repadmin /showutdvec command. This version of Repadmin.exe displays the up-to-dateness vector USN for all domain controllers that replicate a common naming context. To detect a USN rollback, compare the output of the repadmin /showutdvec command on the domain controller with the output of the same command on the domain controller’s replication partners. If the direct replication partners have a higher USN number for the domain controller than the domain controller has for itself, and the repadmin /showreps command does not report replication errors between direct replication partners, you have compelling evidence of a USN rollback.

C:\>Repadmin /showutdvec dc1 dc=contoso,dc=com
Caching GUIDs…
Site1\DC1 @ USN 10 @ Time 2004-08-04 15:07:15
Site2\DC2 @ USN 24805 @ Time 2004-08-04 15:06:59
C:\>Repadmin /showutdvec dc2 dc=contoso,dc=com
Caching GUIDs…
Site1\DC1 @ USN 50 @ Time 2004-08-04 15:07:15
Site2\DC2 @ USN 24805 @ Time 2004-08-04 15:06:59

Note in the example repadmin output above, DC2 has a USN of 50 for DC1 while DC1 has a USN of 10 for itself.

Starting with Windows 2003 Sp1 or newer (or earlier with the KB  875495 hotfix), DCs track and flag when another DC (source DC) sends a USN that was previously acknowledged with the same invocation id.

When a USN Rollback is identified on a Windows 2003 Sp1 or newer (or earlier with the KB  875495 hotfix), the following actions are performed on that DC:

  • The DC write event 2095 to the DC’s Directory Services event log.
  • The DC disables inbound and outbound replication.
  • The DC recognizes the USN Rollback and pauses its Netlogon service. This prevents any changes from being performed on the DC.

Windows 2003 SP 1 and newer Domain Controllers that detect a USN Rollback state on a replication source DC, log the following event:

Event Type: Error
Event Source: NTDS Replication
Event Category: Replication
Event ID: 2095
Date: 3/10/2005
Time: 4:26:51 PM
User: USN\2B25VB$
Computer: 2B9A
Description: During an Active Directory replication request, the local domain controller (DC) identified a remote DC which has received replication data from the local DC using already-acknowledged USN tracking numbers. Because the remote DC believes it is has a more up-to-date Active Directory database than the local DC, the remote DC will not apply future changes to its copy of the Active Directory database or replicate them to its direct and transitive replication partners that originate from this local DC. If not resolved immediately, this scenario will result in inconsistencies in the Active Directory databases of this source DC and one or more direct and transitive replication partners. Specifically the consistency of users, computers and trust relationships, their passwords, security groups, security group memberships and other Active Directory configuration data may vary, affecting the ability to log on, find objects of interest and perform other critical operations. To determine if this misconfiguration exists, query this event ID using http://support.microsoft.com or contact your Microsoft product support. The most probable cause of this situation is the improper restore of Active Directory on the local domain controller. User Actions: If this situation occurred because of an improper or unintended restore, forcibly demote the DC. Remote DC: b55ee67f-ed73-4970-b2d4-7dc6f571439f Partition: CN=Configuration,DC=usn,DC=loc USN reported by Remote DC: 24707 USN reported by Local DC: 20485 For more information, see Help and Support Center at http://support.microsoft.com.

On the DC with USN Rollback the following events are logged:

Event Type: Warning
Event Source: NTDS General
Event Category: Replication
Event ID: 1113
Date: 3/10/2005
Time: 4:26:51 PM
User: USN\2B25VB$
Computer: 2B9A
Description: Inbound replication has been disabled by the user. For more information, see Help and Support Center at http://support.microsoft.com.

Event Type: Warning
Event Source: NTDS General
Event Category: Replication
Event ID: 1115
Date: 3/10/2005
Time: 4:26:51 PM
User: USN\2B25VB$
Computer: 2B9A
Description: Outbound replication has been disabled by the user. For more information, see Help and Support Center at http://support.microsoft.com

Event Type: Error
Event Source: NTDS General
Event Category: Service Control
Event ID: 2103
Date: 3/10/2005
Time: 4:26:51 PM
User: USN\2B25VB$
Computer: 2B9A
Description: The Active Directory database has been restored using an unsupported restoration procedure. Active Directory will be unable to log on users while this condition persists. As a result, the Net Logon service has paused. User Action See previous event logs for details. For more information, see Help and Support Center at http://support.microsoft.com.

Resolving USN Rollback on a DC

Microsoft recommends two methods to resolve a USN Rollback state:

  1. Demote & re-promote the DC – this resets the Invocation ID & the USN.
  2. Restore the DC from a supported backup (preferably using Microsoft’s Backup utility).

I recommend performing resolution step #1. Since the typical scenarios that bring about a USN Rollback involve imaging or performing an un-supported restoration, having a USN Rollback in the environment typically points to process issues. Often this is related to improper restore procedures.

In this situation, it is best to demote the Domain Controller by running dcpromo /forceremoval and perform a metadata cleanup of that DC. If the demoted DC was hosting any of the FSMO roles, they must be seized to another DC.

Only run DCPromo on the server to re-promote it after the metadata cleanup is successful (and any FSMOs are transferred).

References:

(Visited 3,785 times, 1 visits today)