Troubleshooting SYSVOL replication between domain controllers, using DCDIAG

Since we had a power outage a few days ago, I’ve seen some problems with replication of the sysvol folder throughout the domain controllers, most likely due to some file corruption on one domain controller that halted replication to the remaining domain controllers. (The domain controller in question had a disk fail in the RAID 1 array, which then refused to rebuild due to disk issues on the existing live disk.)

The sysvol folder is where all group policies and logon scripts are held, and is accessible by all domain members in order to process the policies and scripts. The “original” is held on the first domain controller in the domain.

Replication of the sysvol folder is separate to Active Directory replication. Sysvol replication relies on the File Replication Service running on the domain controller, and any failures are logged in the windows event logs.

Firstly, we had to work out what was going on. This is best done by examining the event logs for these errors, and running some diagnostic tools on the servers – in this case, DCDIAG. DCDIAG is part of the Server 2003 support tools package.

The output looks like this:

 

C:\>dcdiag
Domain Controller Diagnosis
Performing initial setup:
   Done gathering initial info.
Doing initial required tests
   Testing server: Default-First-Site-Name\”DOMAIN CONTROLLER”
      Starting test: Connectivity
         ......................... “DOMAIN CONTROLLER” passed test Connectivity
Doing primary tests
   Testing server: Default-First-Site-Name\”DOMAIN CONTROLLER”
      Starting test: Replications
         ......................... “DOMAIN CONTROLLER” passed test Replications
      Starting test: NCSecDesc
         ......................... “DOMAIN CONTROLLER” passed test NCSecDesc
      Starting test: NetLogons
         ......................... “DOMAIN CONTROLLER” passed test NetLogons
      Starting test: Advertising
         ......................... “DOMAIN CONTROLLER” passed test Advertising
      Starting test: KnowsOfRoleHolders
         ......................... “DOMAIN CONTROLLER” passed test KnowsOfRoleHolders
      Starting test: RidManager
         ......................... “DOMAIN CONTROLLER” passed test RidManager
      Starting test: MachineAccount
         ......................... “DOMAIN CONTROLLER” passed test MachineAccount
      Starting test: Services
         ......................... “DOMAIN CONTROLLER” passed test Services
      Starting test: ObjectsReplicated
         ......................... “DOMAIN CONTROLLER” passed test ObjectsReplicated
      Starting test: frssysvol
         ......................... “DOMAIN CONTROLLER” passed test frssysvol
      Starting test: frsevent
         There are warning or error events within the last 24 hours after the
         SYSVOL has been shared.  Failing SYSVOL replication problems may cause
         Group Policy problems.
         ......................... “DOMAIN CONTROLLER” failed test frsevent
      Starting test: kccevent
         ......................... “DOMAIN CONTROLLER” passed test kccevent
      Starting test: systemlog
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 08/17/2012   15:44:48
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 08/17/2012   15:44:50
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 08/17/2012   15:44:51
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 08/17/2012   15:44:52
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 08/17/2012   15:44:52
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 08/17/2012   15:44:52
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 08/17/2012   15:44:53
            (Event String could not be retrieved)
         ......................... “DOMAIN CONTROLLER” failed test systemlog
      Starting test: VerifyReferences
         ......................... “DOMAIN CONTROLLER” passed test VerifyReferences
   Running partition tests on : DomainDnsZones
      Starting test: CrossRefValidation
         ......................... DomainDnsZones passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... DomainDnsZones passed test CheckSDRefDom
   Running partition tests on : ForestDnsZones
      Starting test: CrossRefValidation
         ......................... ForestDnsZones passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... ForestDnsZones passed test CheckSDRefDom
   Running partition tests on : Schema
      Starting test: CrossRefValidation
         ......................... Schema passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Schema passed test CheckSDRefDom
   Running partition tests on : Configuration
      Starting test: CrossRefValidation
         ......................... Configuration passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Configuration passed test CheckSDRefDom
   Running partition tests on : nic
      Starting test: CrossRefValidation
         ......................... nic passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... nic passed test CheckSDRefDom
   Running enterprise tests on : nic.local
      Starting test: Intersite
         ......................... nic.local passed test Intersite
      Starting test: FsmoCheck
         ......................... nic.local passed test FsmoCheck

 

The failed tests above are due to past errors being in the event log from before the sysvol fix. If you’re having sysvol replication errors, you’ll see the replication tests failing, along with systemlog and frsevent failures.

 

To fix this, the intact sysvol folder needs to be forced to replicate across the domain. The process is as follows:

Stop the FRS service on all domain controllers.

Locate the Burflags entry under the following registry key:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process at Startup

And change the DWORD value to D4 on the “source” domain controller (this is to flag an authoritative restore, and D2 on the child domain controllers (non-authoritative). Before doing this, take a backup of the sysvol folder, but make sure you store it on the same partition, otherwise permissions may change, and this would impact group policy if you had to restore it.

Then restart the FRS service on all domain controllers (the D4 one first) and wait for replication to occur. This can take up to a few hours, depending on the infrastructure, number of domain controllers, and size of the sysvol folder.

 

Afterwards, running

Net share

At a command prompt will also show you the shared folders on the domain controller – so once this replication is complete, you should see the sysvol and netlogon shares present.

 

Then you can also run DCDIAG tests on each domain controller to confirm.