7

The SBS 2011 (Exchange is at SP1) Windows 2008 R2 suddenly stopped making backups with error

Backup unsuccessful. A Volume Shadow Copy Service operation failed. Unknown error (0x800423f4).

When manually starting the backup from SBS console, the backup will fail after 52 seconds.

Hardware setup

The source of the backup are two RAID-1 volumes connected to a P420:

  • 2 x 128GB Samsung SSD 840 — 78 GB out of 119 GB available
  • 2 x 300GB ATA WDC WD3000HLFS — 218 GB out of 279 GB available

The backup destination is a USB drive with 298 GB of (free) space.

System State backup fails

> wbadmin start systemstatebackup -backuptarget:\\?\Volume{3956a561-b129-11e3-805c-7446a0f49555}
...(203.18 MB)...

Failure in a Volume Shadow Copy Service operation.

ERROR - Volume Shadow Copy Service operation error (0x800423f4)
The writer experienced a non-transient error.  If the backup process is retried,
the error is likely to reoccur.

I could not read .etl files

The wbadmin command output also points to log files that should be available at C:\Windows\Logs\WindowsServerBackup\, however there are no .log files there (only .etl files).

NTDS writer is in state "[11] Failed"

> Vssadmin list writers

The only item with an error is the NTDS writer:

Writer name: 'NTDS'
   Writer Id: {b2014c9e-8711-4c5c-a5a9-3cf384484757}
   Writer Instance Id: {d88809aa-a5ef-460e-84c0-4dd8a8350184}
   State: [11] Failed
   Last error: Non-retryable error

Event viewer

In the event viewer Application event log the wbadmin start systemstate command registers

  • an error for application Backup with Event-ID 521 and error number 2155348129.
  • After starting the command the ESENT event-IDs occur is this order: 2001, 2001, 2003, 2006, 2003, 2006,
  • then there is the VSS event 8229 with error 0x800423f4,
  • then there are 18264 events (MSSQL database backup succeeded for MICROSOFT##SSEE, SBSMONITORING and SHAREPOINT),
  • and finally there is the Backup event 521 with error 2155348129.

Regression

  • Reboot
  • Disable CrashPlan backup service
  • Disable SQL Server VSS Writer
  • C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\BIN>PSConfig.exe -cmd upgrade -inplace b2b -force -cmd applicationcontent -install -cmd installfeatures
  • Clear Volume Shadow Copy files for boot volume

    > vssadmin delete shadows /for=c: /all

  • Set Volume Shadow Copy to use unlimited space on both volumes

  • Delete backup catalog

    > wbadmin delete catalog

  • Restart the Com and DCOM services

  • Restart the Volume Shadow Copy Service
  • Uninstall Windows Backup component; reboot; install Windows Backup component
  • Install Update Rollup 4 for Windows Small Business Server 2011 Standard (KB2885319)
  • Re-registering Vss Dlls
  • Install Sharepoint 2010 Foundation SP2
  • cd "C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\BIN";PSConfig.exe -cmd upgrade -inplace b2b -force -cmd applicationcontent -install -cmd installfeatures
  • increase swap file from 32MB to 1.5x RAM (90000 MB)
  • Run dcdiag /fix; remove old domain controller; reboot; run dcdiag /fix again

Command "dcdiag /fix" fails

Starting test: NCSecDesc
    Error NT AUTHORITY\ENTERPRISE DOMAIN CONTROLLERS doesn't have
       Replicating Directory Changes In Filtered Set
    access rights for the naming context:
    DC=DomainDnsZones,DC=CONTOSO,DC=COM
    Error NT AUTHORITY\ENTERPRISE DOMAIN CONTROLLERS doesn't have
       Replicating Directory Changes In Filtered Set
    access rights for the naming context:
    DC=ForestDnsZones,DC=CONTOSO,DC=COM
    ......................... Contoso-DC1 failed test NCSecDesc 

FRS evntvwr

File Replication Service log shows some errors with id 13568, De File Replication-service de volgende fout aangetroffen in de replicaset DOMAIN SYSTEM VOLUME (SYSVOL SHARE): JRNL_WRAP_ERROR.

How do I let this backup complete its backups again?

Pro Backup
  • 914
  • 4
  • 15
  • 33
  • I can't comment on why the backup is failing, but the Dcdiag output is expected if you don't have RODC in your environment (to be exact, it's due to you never run adprep /rodcprep), kb 967482 – strongline Jan 12 '16 at 23:10

6 Answers6

3

Volume shadow copying may stop working at times for a number of reasons I don't really get. But I have had success in making the VSS service run correctly again by deleting all existing shadow copies on a particular volume. Do like this in an elevated command prompt:

vssadmin delete shadows /for=c: /all

I see that you tried to reset the VSS copies for your volumes, but did you do it like this?

Next, check out the ETL files you get - they are parseable if you use the VSS tracing tools available here. In particular, try doing:

vsstrace -etl <file.etl> -o <outfile>

This should give you the logged events in a readable format. If this doesn't give you anything worthwhile, try getting a list of VSS writers like this:

vssadmin list writers

The result should be a list of entities that use the VSS service to write stuff along with a Last error: entry per writer. In particular, you should check if there is more than just the one failing component.

EDIT: and this - I just remembered I fixed wbadmin strangeness by resetting the backup catalog. This may or may not be an option for you, but I did it like this:

wbadmin delete catalog

Hope it helps!

MrMajestyk
  • 1,023
  • 7
  • 9
  • `vssadmin delete shadows /for=c: /all` doesn't improve. After `wbadmin delete catalog` the backup still fails with the dreaded 0x800423f4 error message. There are 30+ *.etl files on the system. None of them do contain the word "vss" neither "volume" neither "shadow". No luck with this answer. – Pro Backup Jan 17 '16 at 18:09
  • Have you tried parsing the .etl files using the `vsstrace` utility I described? Checking for errors in the resulting output files is quite useful IMHO. It might get you something. Also: Did the `wbadmin delete catalog` help? – MrMajestyk Jan 20 '16 at 11:53
  • No: the `wbadmin delete catalog` did not help. Which exact .etl to use with vsstrace? – Pro Backup Jan 20 '16 at 13:31
  • That is hard to say - try the newest one you can find. I don't know the exact logic behind the .etl stuff. Only that once parsed, they are much more readable. – MrMajestyk Jan 22 '16 at 09:38
  • `vsstrace -etl -o ` does only output 2 lines of text from 5MB C:\Windows\Logs\WindowsServerBackup\Wbadmin.0.etl – Pro Backup Sep 24 '16 at 21:26
3

In my case I just needed to set the Volume Shadow Copy Service (VSS) to manual and stop the service. I've seen before where forums will suggest setting this service to automatic; bad advice. I've never seen that fix anything related to VSS.

namtaH
  • 31
  • 1
1

I had a similar issue recently - it was due to an old Acronis filter driver that didn't get uninstalled correctly. You can check the "Device Stack" property under the details of the drive.

This is a normal stack example:
This is an example (normal) stack

In my case there was an additional entry there that I looked up, and discovered it was from Acronis, and so I ran their "full cleanup" utility (https://kb.acronis.com/aticleanup)

Mark Sowul
  • 1,809
  • 1
  • 11
  • 14
1

Almost a year of Microsoft updates later, the VSS NTDS error 11 issue is still there.

This time I did:

  1. > vssadmin delete shadows /for=c: /all
  2. Stop CrashPlan Backup Service
  3. Restart COM+ Event System
  4. Restart Volume Shadow Copy
  5. > wbadmin delete catalog

Opening the Windows Small Business Server 2011 backup console now lists that there is no backup configured. I did now re-create the server backup which also re-formats the USB drive. First time starting the backup stops after ±52 seconds. The second time the backup procedure is already running for over 30 minutes.

The Windows backup complains after many hours that there is not enough free space available on the drive. I have read that the amount of free space needs to be 2 times the backup size.

COM+ Event System

update: Sunday 30.04.2017 the hard disk drive has been replaced with a new drive with plenty (3TB) capacity. The list of steps above resulted in the dreaded VSS NTDS 0x800423f4 error. Restarting the machine doesn't improve. Restarting individual services doesn't improve either. The 0x800423f4 error appears within 1 minute after starting Win SBS 2011 server backup, except for restarting "COM+ Event System". This while the CrashPlan service is turned off and the machine was last restarted after restarting "Base Filtering Engine". Now the "Backup Now" is already running for over 10 minutes without error 0x800423f4. Since the last server restart these services have been restarted without a change in the "Backup Now" result:

  • Block Level Backup Engine Service
  • Bonjour-service
  • Certificate Propagation
  • ClamWin Free Antivirus Database Updater
  • ClamWin Free Antivirus Scanner Service
  • CNG Key Isolation

Now the Windows Server Backup details shows "completed" as status instead of "The backup is not started". However the completion window now shows Unknown error (0x80042302).

The Event Log entry with ID 12294 might be related:

Fout in de Volume Shadow Copy-service: fout bij het aanroepen van een routine op de schaduwkopieprovider {b5946137-7b9f-4925-af80-51abd60b20d5}. De routine heeft E_INVALIDARG geretourneerd. Routinedetails GetSnapshot({00000000-0000-0000-0000-000000000000},0000000004FB8DF0).

b5946137-7b9f-4925-af80-51abd60b20d5 is not listed when running vssadmin list writers.

When trying to re-register the Volume Shadow Copy provider service component:

C:\Windows\System32> regsvr32 /i swprv.dll

The command returns error code: 0x80070715, as it possibly should on Windows 2008 R2.

Pro Backup
  • 914
  • 4
  • 15
  • 33
  • 1
    FWIW, SBS 2011 servers have been my biggest source of unsolvable problems. Specifically relating to backups and VSS, any SBS or 2008 R2 with Citrix on it is almost guaranteed to be an endless headache. If a writer errors, it won't show up on vssadmin list writers; you'll have to figure out what it refers to in the registry and either fix or delete it there. – SilverbackNet Jul 03 '18 at 02:57
1

I have also VSS writers errors previously. In our servers mostly system writers, registry writers and WMI writers gets timed out. After digging a lot I found one solution to recycle the below services. Then server reboot will not be required and next backup will be successful.

  • Application Host Helper Service (not applicable)
  • COM+ Event System
  • Cryptographic Services
  • IIS Admin Service
  • Volume Shadow Copy
  • Windows Management Instrumentation

source: https://social.technet.microsoft.com/Forums/office/en-US/41be964f-bc4d-48cf-9940-135fa84eaf61/vss-ntds-writer-failed?forum=windowsbackup#237db1d7-c6a2-4ed4-820c-7d7ff9d317d2

Before this, I re-applied the PSConfig.exe ... command, added some swap files and removed those again, did some reboot in between, cycled these services and now "Windows Backup" completes its "System State" backup again without issues.

Pro Backup
  • 914
  • 4
  • 15
  • 33
0

So, have you by chance tried just completely deleting the backup job and remaking it again? I have had Server Backup do something like you described several times and after jumping through countless hoops simply deleting the job, remaking it, picking the same device and when it says "keep backup" you can say that, and everything is good to go.

thelanranger
  • 139
  • 7
  • I don't think that deleting and re-creating the backup will help. Because `ntdsutil` > `snapshot` > `activate instance ntds` > `create` is unable to create a snapshot due to temporary error number 0x800423f4. – Pro Backup May 22 '19 at 15:46
  • I'm not certain of the exact error number that you would receive, but those are the symptoms that I generally receive when I am in your situation. Deleting the entire job and remaking it usually corrects the issue. It's that thing you really try to not do but it is generally what fixes it. I believe there are some specialty permissions that Windows Backup uses to write VSS style data into the System Volume Information directory on the external disk that you cannot alter with vssadmin/ntdsutil etc. – thelanranger May 29 '19 at 23:54