5

We have a fleet of AWS EC2 instances running Windows Server. Since moving from Windows Server 2012r2 to 2016, we've encountered an issue where a server is shut down for unknown reasons. After an exhaustive inspection of event logs, the only consistency appears to be the following:

The process C:\Windows\system32\winlogon.exe ([computername]) has initiated the power off of computer [computername] on behalf of user NT AUTHORITY\SYSTEM for the following reason: No title for this reason could be found
Reason Code: 0x500ff
Shutdown Type: power off

We've considered and theoretically ruled out the following:

  1. Windows Updates issue

    • No updates were running according to event logs or Get-WindowsUpdateLog. Sconfig > "Windows Update Settings" is set to DownloadOnly
  2. Power button toggle, or hardware/battery issue

    • This is an AWS EC2 instance and we've never experienced this with any 2012r2 or 2012 servers. If it was hardware related surely it would affect all server versions.
  3. Windows Server license expiration

    • These servers are licensed correctly according to "slmgr.vbs /dlv", and the shutdowns have happened at 39, 62, and 188 days after their initial turn-on.
  4. With old versions of mstsc there is a power button displayed on the logon screen, which can be used to turn off the system in this manner

    • This theory is largely based on this post but to be clear that is for a 2012 server, and we're on 2016. I have also not been able to repro this at all.

Does anyone have any idea what could be causing this shutdown? Or, any idea how we could go about finding more information? I've looked through every log file and event log I can find. There is also no dmp file corresponding to the time of shutdown.

Nathan
  • 121
  • 2
  • 10
  • Because the shutdown request is coming from winlogon.exe and the local system account, it pretty much has to be either someone shutting it down at the logon screen (via the virtual console, or perhaps via RDP in edge cases) or by pushing the (virtual) power button. Shutting down at the logon screen is disabled by default on server operating systems, but you might double-check that setting. As to the power-button option, have you asked Amazon support about this? – Harry Johnston Apr 10 '19 at 23:10
  • ... actually that post you link to says that on their servers powering down from the console was enabled by default, so you really should check that setting. – Harry Johnston Apr 10 '19 at 23:14
  • 1
    Does stopping the instance from the EC2 console result in the same information being logged? That should look like a power off. – Michael - sqlbot Apr 11 '19 at 03:14
  • @HarryJohnston I haven't been able to reproduce the power button on the logon/lock screen using a variety of old mstsc's, but that option appears to actually be enabled. We were hoping to repro before configuring a GPO to disable it, but perhaps the time has come. – Nathan Apr 11 '19 at 14:36
  • @Michael-sqlbot No - I tested that and the shutdown is a different type. – Nathan Apr 11 '19 at 14:37
  • I've never used AWS, but I thought it provided a way to connect via a virtual console? I don't know about getting to the logon screen via Remote Desktop (except for [this](https://serverfault.com/a/959420/94065) edge case which is unlikely to apply) but perhaps there are third-party clients that can do it or something. (Personally I would be wondering if these incidents corresponded to maintenance operations of some kind on the underlying hardware, or perhaps power outages. But as I say, I've never used AWS.) – Harry Johnston Apr 11 '19 at 19:07
  • Yeah, we can rdp to AWS instances just like any other Windows server. Our going theory is that there's some edge case (possibly with a third-party client as you suggest) where the power option can be selected from the lockscreen (if this setting is enabled in GP). If our GP fix works we may suggest to AWS they update their default AMI, as the default for 2016 is "enabled" and for 2012+2012r2 it's "disabled". We were also suspicious of power or hardware incidents, only if that were the case we'd expect more reports of this issue, and we'd also expect to see it on our 2012r2 servers. – Nathan Apr 11 '19 at 19:36

3 Answers3

3

The Reason Code says that it's a BlueScreen (SHTDN_REASON_MAJOR_SYSTEM | SHTDN_REASON_MINOR_BLUESCREEN)

Reference: https://docs.microsoft.com/fr-fr/windows/desktop/Shutdown/system-shutdown-reason-codes

You should check that your drivers/softwares are up-to-date. Don't forget to check your antivirus too, because it's possible that an outdated third-party antivirus can lead to bluescreens.

You can use BlueScreenView to help you analyzing BSOD memory dumps (if any).

Swisstone
  • 6,357
  • 7
  • 21
  • 32
  • That sounds equally reasonable and troublesome, as these instances were built from recent AMIs. As an aside, looking at that link I'm unclear on how reason code combinations work. The reason code from our event is `0x500ff` whereas the reason code for _SHTDN_REASON_MAJOR_SYSTEM_ is `0x00050000` and _SHTDN_REASON_MINOR_BLUESCREEN_ is `0x0000000F`. If we add them, we'd get `0x5000F`, which is close but not exact. What am I missing to get the extra F? – Nathan Apr 10 '19 at 19:31
  • `0x500ff` just means a user-initiated shutdown. – Harry Johnston Apr 10 '19 at 23:08
  • 1
    @HarryJohnston can you explain how you're able to determine that from MS documentation? – Nathan Apr 11 '19 at 13:38
  • It is documented as a bug [here](https://support.microsoft.com/en-us/help/2001061/on-a-computer-running-windows-vista-windows-7-windows-server-2008-and) but they seem to have decided to just leave it that way, since even the latest version of Windows 10 does the same thing. No documentation that I can find, but that's how it is. – Harry Johnston Apr 11 '19 at 18:58
  • @HarryJohnston Ah, thanks! I saw that link before but did not properly contextualize it as the same incorrect reason code we're seeing, mostly because we're not seeing two 1074 events - just one. We're going with this theory though, have created and linked a GPO to shutoff the power button, and will update this question with our results. – Nathan Apr 11 '19 at 19:31
0

Just wondering if this issue ever happened after you put this group policy in place?

I have the exact same issue, we have EC2 instances, two of them were shutdown, we spoke with AWS they said no API calls were made to shutdown, we have connectwise control but it doesn't show any one logging in.

I was able to get the exact same event id when i shutdown the server without authentication via connectwise control however there was no one logged in at that time.

We also have Royal TS from where we access the servers. Not sure if you had any of these products.

I opened a case with Microsoft and they are not able to find it either.

I would really appreciate if you can please let me know if the issue went away after you put the group policy inplace.

thanks

  • We never saw this issue again after configuring the group policy, thankfully. We don't use Royal TS or connectwise control. – Nathan Apr 05 '22 at 21:17
0

We followed advice from the comment of @HarryJohnston and created a GPO to disable the option to shutdown a server from the lock screen. The specific policy is:

Computer Configuration\Windows Settings\Security Settings\Local Policies\Security Options > Shutdown: Allow system to be shut down without having to log on > Disabled

Since doing this a month ago we have seen no unexpected shutdowns (and were seeing about one per week previously). It is still strange to me that AWS's default Windows Server 2016 AMI would have this option enabled, and that it would actually be accessible from somewhere, but that seems to have been the case.

Nathan
  • 121
  • 2
  • 10