2

OK, this is a really weird one and I'm not even sure how to properly describe it. We had a customer complain that a specific page on our website wasn't working, and one of our internal technicians was able to reproduce the issue as well. Most of the website is working fine. This is deployed on Azure App Service.

I checked running the exact same page as the technician, and it worked fine for me. The entire request is identical, except for authentication cookies. When I run the request, I get 200 OK, but the technician and the customer get 404 NOT FOUND.

The issue only started after we did a VIP swap this morning on Azure App Service (which I am new to). I deployed a service update this morning to the Staging Deployment Swap, then a few minutes later did the VIP Swap. I think that both the customer and the technician had their browser open and session active during the VIP Swap.

I've done some troubleshooting, and here is what I discovered. I can use Fiddler to capture the exact trace for the web page that works fine for me. Then I can copy only one value from the request for the technician that gets 404 Error, and suddenly I can reproduce the 404 error as well. The difference is one cookie:

Cookie: ARRAffinity=blahblahblahblah;

My basic understanding is that this is a key for identifying which server the user is connecting to so they get affinity to a specific instance in the load-balanced set (2 servers). We were able to fix the issue by having the technician and customer delete all cookies in their browser, but even logging out and back in wouldn't fix the issue.

Why would a "stale" affinity key cause a random 404 on one specific page? Is it possible that some of the user's requests are actually getting directed to the old staging deployment site, even though they are hitting the url that connects to the Production deployment site?

mellamokb
  • 133
  • 1
  • 5

1 Answers1

2

There are 2 things here:

  1. The session affinity. As you may read in this article, you can now remove session affinity in web apps, if this deserves your use case (ex. you handle sessions outside the web app or simply you don't have session specific info).
  2. The 404 error is a bit strange one. It may be from a faulted deployment, so you may want to redo a full deployment on a new slot and swap it again. If still have errors, take a look at the web site itself and see if there isn't any "stateful" code which would give you specific behavior.

Please let us know what happened at the end.

  • 2
    Yes, it turned out that it was a faulted deployment. Coworker suggested the same thing. I redeployed and swapped a second time and the issue went away. It was very strange because almost every page worked fine except for one, and this is an MVC site so it's all in the binaries... Oh well. – mellamokb May 20 '16 at 03:58
  • 1
    I'm hitting the same issue on my site. Is there a way to detect if a deployment has this fault? Did removing session affinity help? – Marc O'Morain Oct 11 '16 at 21:28
  • 2
    @MarcO'Morain: I have never found a way to detect a faulty deployment. It does seem about 1 in 10 deployments have weird glitches like this, and the only solution I've found is to login to the staging site, verify everything is working, before doing the VIP swap. If I see any weird behaviors that I didn't see in the QA site, my go-to solution is to try a re-deploy and see if the issue goes away. Very annoying but that's the best I've been able to figure out. – mellamokb Oct 11 '16 at 21:37
  • 1
    Thanks @mellamokb. Do you still use affinity cookies, out of interest? – Marc O'Morain Oct 11 '16 at 21:39
  • 1
    @MarcO'Morain: That affinity cookie system is built-in to Azure App Service I believe. It's not something I turned on intentionally, or have the ability to disable. – mellamokb Oct 11 '16 at 21:43
  • @mellamokb I'm with Marc and struggling with this problem. We noticed this after updating OWIN. It might not be related but I was wondering if you made any changes to OWIN leading up to seeing this issue for the 1st time? Also, we only see this issue with POST and PUT requests. GETs are fine. Did you have the problem with GET requests or only other verbs? – Mr. Flibble Oct 12 '16 at 08:46
  • 1
    @Mr.Flibble: As I remember, it was not impacting any GET requests. In fact, I could only find a single PUT/POST request in the whole website that was not working. All other PUT/POST were working fine. I'm thinking it was some sort of caching bug but not entirely sure. Sorry I am not familiar with OWIN. – mellamokb Oct 12 '16 at 13:24