4

I've got a strange problem relating to basic Windows Scheduled tasks that has baffled me for a few weeks now. These jobs fail to run on some servers, but work on others which are running on different hardware/VMs platforms. Initially this was a problem we spotted deep within one of our production systems, but I have managed to simplify it down so it works with minimal changes from 'out of box' configuration. I've actually created a 5 line batch file to make minimal changes to a clean installation to set this up, to make sure each test machine is identical.

The configuration

  • All machines run fresh installs of Server 2016 with the latest updates applied.
  • All machines live in the same OU on Active Directory, and have the same policy set applied
  • We create a local user with a password meeting the complexity requirements set by the domain, and we make it a member of the local Administrators group
  • We grant the user account 'Logon As Batch' and 'Logon As Service' local security rights.
  • Schedule a basic task, running as this user, to open 'notepad.exe'.
  • The task must be set to run whether the user is logged on or not, and has saved the password in the task.
  • All machines are imaged from the same task sequence, by the same Admin user.
  • All tasks are created by the same admin user.

The hardware that works

We have built VMs on VMware, XenServer, and on physical HP servers and other models of Dell server (Poweredge R730, R720, R430). Most of these use spinning 10k/15k SAS disks, though one of the HP servers I built using SATA SSDs in RAID 10 as a test.

The hardware that doesn't work

Our new servers have problems however. These are new Dell Poweredge R540s. They have built in BOSS RAID 1 controllers (M.2 RAID 1 SSDs basically), with SAS SSDs as additional fast storage via a PERC controller.

The problem

On the older hardware, you can see the scheduled task running if you manually trigger it, though obviously you don't see notepad actually open if it's running as a different user.

On the Poweredge R540s however the task fails to start, giving error code 2147943726 (0x8007052e). I believe this is an 'unknown username or bad password' error, despite the credentials being correct, and the user account having been freshly created.

Here you can see the task history showing the task failing

The task fails to run manually, and the following security event is audited in the Security Event Log:

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          12/10/2018 17:30:00
Event ID:      4625
Task Category: Logon
Level:         Information
Keywords:      Audit Failure
User:          N/A
Computer:      <computername>
Description:
An account failed to log on.

Subject:
    Security ID:        SYSTEM
    Account Name:       <computername>$
    Account Domain:     <domainname>
    Logon ID:       0x3E7

Logon Type:         4

Account For Which Logon Failed:
    Security ID:        NULL SID
    Account Name:       @@CyBAAA.....<this is a long Base 62 ID, so I've removed it in case it contains sensitive information>
    Account Domain:     

Failure Information:
    Failure Reason:     Unknown user name or bad password.
    Status:         0xC000006D
    Sub Status:     0xC0000064

Process Information:
    Caller Process ID:  0x590
    Caller Process Name:    C:\Windows\System32\svchost.exe

Network Information:
    Workstation Name:   <computername>
    Source Network Address: -
    Source Port:        -

Detailed Authentication Information:
    Logon Process:      Advapi  
    Authentication Package: Negotiate
    Transited Services: -
    Package Name (NTLM only):   -
    Key Length:     0

Yesterday, I rebuilt 2x R540s, 2x VMs and 1x HP Server, and only the R540s had this fault. This is predictable - we've tried re-imaging all our test machines and each time the result is the same.

Other relevant findings

I can make the R540s work correctly if they are built directly into the 'computers' OU on AD, meaning they don't get our Security Baseline policy. The tasks install and run perfectly. If I then move the computer object into the same OU as all the rest of the machines we're testing with, the tasks stop working. Moving the object back out of the OU into the computers OU does not make the tasks work again. Clearly something is being changed, but I can't see what, and I don't know why it would only impact R540s in this way and not VMs or other models of hardware.

We have exported and compared the local security policy on working and non-working machines to check for differences. Those which exist are minor, and when the working policy set is imported to a broken machine, it stays broken. Similarly, importing the policy set from a broken machine to a working machine does not break the working machine.

If I change the scheduled task so that 'Do not store Password' is ticked, the task does run, but this won't work for us in production as the task needs access to non-local resources.

If I change the scheduled task to run in the context of my domain admin account (while giving my self 'logon as batch' and 'logon as service' rights), it works even with 'Do not store password' unticked. So whatever is breaking, it seems to be related only to local user accounts.

Other things I have tried which made no difference:

  • Task run with highest privileges
  • Local user set with 'password does not expire' turned on/off
  • Local user password strength & length increased/decreased
  • I changed the task type from 'Vista/2008 (default)' to '2016' with no change.
  • I saw a suggestion of changing from 'do not start a new instance' in the task configuration, but this didn't help.
  • I built another physical machine with fast SSD storage in case the machines were imaging too quickly, and there was some kind of race condition caused by the quicker disk access on the R540s. I don't think that's the case.
  • Drivers, firmware and BIOS are all up to date on the R540s as of yesterday.

The conclusion

I think something must be being changed by our group policy which is stopping this from working - probably the security baseline in some way. What I'm at a complete loss to explain however is why this only seems to break the operation of scheduled tasks on a certain hardware model, which should have identical configurations to working machines created at the same time, in the same way. I can't see what would cause that.

Does anyone know if scheduled tasks or basic security authentication uses hardware features in any way? Does TPM have anything to do with this? Is there anything else I can do to trace back what is causing the user account to fail at the point the task runs? I've run out of ideas.

Also, in case anyone asks, I have tried doing a 'runas' from a command line to prove the local user account I've created works, and that the password I've used is correct.

Thanks!

  • In the security event log entry for the logon failure, is the computername\username correct? – Greg Askew Oct 13 '18 at 12:56
  • Hi @GregAskew, thanks for getting back to me. The security log shows an 'Audit Failure' for Logon at the time, with a Null SID being referenced rather than the account I specified in the task. I'll edit my original question with the details. – Alan Fleming Oct 13 '18 at 15:17
  • 1
    The "@@" prefix is usually associated with accounts that are smart card credentials. The 0xC0000064 substatus means "username does not exist". If you export the task to an Xml file, does it contain the correct credentials? Also compare to the output of `schtasks.exe /v /fo CSV`. – Greg Askew Oct 13 '18 at 17:25
  • What is the output of the commend: `REG QUERY "HKLM\Software\Microsoft\Windows\CurrentVersion\Policies\System" /v ScForceOption` – Greg Askew Oct 13 '18 at 17:37
  • The scheduled task CSV output is identical when comparing a working machines output with a non working machine (other than the host name). The exported task XML is the same apart from the task creation timestamp which I'd expect, plus the UserID SID value. I did wonder what the significance of the '@@' prefix was, so thanks for explaining that one. – Alan Fleming Oct 13 '18 at 17:51
  • Does the user SID value in the Xml file match the SID of the local account for the task? – Greg Askew Oct 13 '18 at 18:01
  • ScForceOption is 0 on all machines currently. The user SID listed in the task XML matches what I can see in the registry at `HKLM\Software\Microsoft\Windows NT\CurrentVersion\ProfileList`. I should point out that the entry in 'profilelist' doesn't get created until I manually do a 'runas' to launch cmd.exe as my test user. It doesn't matter if the profile exists or not when running the task. – Alan Fleming Oct 13 '18 at 18:32
  • Not sure why it isn't picking up the correct account. Usually the SID appears in the Xml if something changed with the account, like it was renamed, but it should still work. You may want to try replacing the SID with computername\username, and ensure the LogonType element has Password in it. – Greg Askew Oct 13 '18 at 19:21
  • If I change the UserID in the task XML from SID to computername\username and re-import, it's still the same. When I re-export this new task as an XML file, it has reverted back to the SID. – Alan Fleming Oct 13 '18 at 19:40
  • I've checked the problematic R540 hardware, and can't see any sign of it having any smart card or biometric readers installed. Nothing shows in device manager, and the smart card services are all disabled as they would be on a different type of server. I can't see any differences in the `HKLM\Software\Microsoft\Windows\CurrentVersion\Authentication\CurrentProviders` list on a broken server compared to a working one. Any idea why the scheduled task would be trying to run using Smart Card credentials when that hardware doesn't (obviously) exist? Is there anything else I can check? – Alan Fleming Oct 15 '18 at 10:48
  • 1
    It may not be, I was only observing that @@ is sometimes associated with a smart card account. What you may want to do is provision a server in the Computers container, then selectively add the security/policy settings until the issue occurs. For example, add half the settings, and if it does not occur, add another 25%, etc. until you identify the problematic setting. – Greg Askew Oct 15 '18 at 15:06
  • It's a believable line of enquiry I feel. If there was some kind of hardware difference which enabled some kind of Biometrics, then it would explain why it only seems to cause problems on the Dell R540s and not elsewhere. Digging further, I spotted that the 'Biometric' service on the R540s starts and runs normally, according to the Application and Services event log. However, on all other models/VMs, it fails to start properly for some reason. I've got a couple of machines rebuilding just now, and will try applying subsets of policy to see if I can narrow it down further. Thanks for your help! – Alan Fleming Oct 15 '18 at 15:17
  • I've narrowed this down a little further now. It looks like the R540 servers were the only ones with Secure Boot enabled in the BIOS. The other older Dell servers didn't have it enabled initially, and VMs didn't support it. One of our group policies was applying on the machines with it turned on, and was actually failing to apply on the 'working' machines. So, the broken machines are working as 'expected', and the working ones are 'broken'. Typical. Anyway, we're working on that dilemma now, and hopefully will find out which bit exactly is hindering us. – Alan Fleming Oct 19 '18 at 09:39
  • This is 3rd party stuff, 99,99% sure. Check for av, device guard, policy orchestator and such. – bjoster Oct 23 '18 at 12:46

1 Answers1

2

The issue appears to have been caused by Device Guard/Credential Guard being enabled via our security baseline policy. Device Guard is only set to run if Secure Boot is enabled in the BIOS, meaning we didn't see it on VMs or older servers which didn't support Secure Boot to begin with.

There's an article showing the exact same issue, here:

https://support.quest.com/kb/226489/scheduled-backups-are-not-running-on-windows-server-2016

As mentioned in the article above, the solution is to have the task run as the System user if that will suffice, or to disable Device Guard if that is not an option.

We have tested with Device Guard disabled, and once we recreated our scheduled task, it ran successfully with Secure Boot still enabled in the BIOS.

Now we know why it wasn't working, we can now try to read up on Device Guard/Credential Guard to see how it should be working with these enabled, and what the best practice is going forward.

Thanks to Greg Askew for his input which ultimately led to this discovery!