A relatively simple Azure App Service (currently .net 4.6.2, against Azure SQL) has been running for over 18 months. It is rock solid. I rarely think about this site and have not released an update for several months.

I wake this morning to find emails from customers saying that the web site is reporting "The specified CGI application encountered an error and the server terminated the process." As a first guess I clicked "Restart" from the Azure portal against the App Service. About a minute later it came back to life and has been running fine ever since.

I went to "Diagnose and solve problems" -> "Availability and Performance". The "Requests and Errors" timeline showed the moment the web site went down and when it came back to life. I drilled into the timeline and selected "Full Report".

In a very matter of fact way it reported the following

Application stop events are detected We analyzed 3 Platform Events, 1 User Event.

Platform(File Server Upgrade) Your application was recycled due to a file server upgrade. This event occurred multiple times during the day across multiple instances. These events cause a Storage Volume movement which may result in a restart of your application. If this restart event negatively impacts the availability of the application, enabling the Local Cache feature can help reduce dependency on storage file servers to some extent. Learn more: Check Local Cache described in the Troubleshooting and Next Steps.

Platform (Infrastructure Upgrade) Around 11/20/2019 2:09:57 PM (UTC), on Instance xxxxxxxx, your application was recycled as the Azure scale unit was undergoing an upgrade. There are periodic updates made by Microsoft to the underlying Azure platform to improve overall reliability, performance, and security of the platform infrastructure where your application is running on. Most of these updates are performed without any impact upon your web app. To reduce the impact of such events on your application, consider deploying your application to multiple regions and use Azure Traffic Manager to distribute the load across regions.

User(Stop Site) Around 11/20/2019 9:00:00 PM (UTC), your application process was restarted due to a user action like stopping the site from azure portal.

I am at a total loss as to what to do and how to prevent this from happening again.

I suspect the "local cache" suggestion is a red herring. I use the file system to create a few temporary files that the code deletes afterwards.

Googling has returned few results.

I guess I am after suggestions as to what I can do to ensure that this never happens again.

Any ideas?

Thanks in advance.

  • 161
  • 4
  • Did you ever find a solution for this? I'm facing the same, with the exact same message. – Douglas Timms Jan 23 '20 at 18:51
  • Nothing whatsoever. After this incident I am seriously considering going aws serverless. – DJA Jan 28 '20 at 01:36
  • @DouglasTimms, what did you end up doing? This is still making me nervous. On a related topic, I have some Azure background processes running and they email me when they encounter an exception. One every week or so, they cannot establish a connection to the Azure SQL database. I guess it is down for maintenance for a minute or so. – DJA Feb 13 '20 at 03:19
  • I ended up ignoring the problem. Fortunately it hasn't happened again. – Douglas Timms Feb 13 '20 at 19:17
  • Spoke too soon - this happened again. – Douglas Timms Feb 24 '20 at 00:24
  • I have now set the WEBSITE_LOCAL_CACHE_OPTION to "Always" in hopes that will prevent this in the future. – Douglas Timms Feb 24 '20 at 00:37
  • Further update: setting WEBSITE_LOCAL_CACHE_OPTION absolutely did not help and seemed to make things worse. – Douglas Timms Feb 24 '20 at 16:44
  • @DouglasTimms I am getting the same issue, I'm about to try local cache, in what way did it make things worse? – user1069816 Feb 27 '20 at 11:37
  • @user1069816 Normally this problem seemed to occur infrequently for me (i.e. it happened once, then 2 weeks later once). But after setting the local cache option, the problem recurred within a couple hours. Maybe coincidence...not sure. – Douglas Timms Feb 27 '20 at 19:56
  • My understanding is the that all the instances in the web app restart when Azure are messing with Azure Storage (which happens infrequently as you say), but moving to local cache prevents this, I've had local storage on in our non production environment today and it seems ok *fingers crossed* if it remains stable I will try in production – user1069816 Feb 27 '20 at 23:15

3 Answers3


In my case setting WEBSITE_LOCAL_CACHE_OPTION to Always did not work.

Instead, setting WEBSITE_ADD_SITENAME_BINDINGS_IN_APPHOST_CONFIG to 1 was what finally helped.


We recently experienced similar but slightly different issue. The app would be slow or unresponsive on some of the instances after an upgrade.

Eventually after many hours of troubleshooting with MS we narrowed down this to some instances being inconsistent that caused problems with app insights (Java springboot).

getCanonicalName worked differently on those instances and instead of returning a iP address returned something else. We had to modify the catalina settings to mitigate this. Fix seems to be in the latest app insights SDK.


I had something similar (in my case however the WebApp didn't start due to temporary storage being full), and I paste here the response I got from a Microsoft Support Engineer to avoid the problem in the future.

There was a storage file server reboot on this instance and the web app was not able to start after till you made a manual restart, the web app got stuck to avoud this kid of issues you can adhere to best practices

  1. Use 2 instances all the time These instances are in different upgrade domains and hence will not be upgraded at the same time. While one worker instance is getting upgraded the other is still active to serve web requests. The web app is currently configured to run on only one instance. Since you have only one instance you can expect downtime because when the App Service platform is upgraded, the instance on which your web app is running will be upgraded. Therefore, your web app process will be restarted and will experience downtime.

  2. Use Health Check This feature automatically removes a faulty instance from rotation, thus improving availability. This feature will ping the specified health check path on all instances of your web app every 2 minutes. If an instance does not respond within 10 minutes (5 pings), the instance is determined to be unhealthy and our service will stop routing requests to it. It is highly recommended for production apps to utilize this feature and minimize any potential downtime caused due to a faulty instance. Note: Health Check feature only works for applications that are hosted on more than one instance. For more information check the documentation below. https://github.com/projectkudu/kudu/wiki/Health-Check-(Preview)
    Article about best practices

  • 173
  • 2
  • 15