5

Problem: After a undetermined amount of time, all websites running in an application pool return 503 errors as IIS has stopped pool due to Rapid Failures from the pool. Increasing the number of failures from 5 to 50 still doesn't fix the issue.

In Event Viewer I can see many Warnings at the time of the crash and the last one is an Error which stops the pool. The Warning all say one of these two things:

A process serving application pool 'domain.com' suffered a fatal communication error
with the Windows Process Activation Service. The process id was 'XXXX'. The data field
contains the error number

A process serving application pool 'domain.com' terminated unexpectedly. The process id
was 'XXXX'. The process exit code was '0xff'.

System: IIS 7.5, Coldfusion 10 (Connects via Tomcat), Win Server 2008

Initially I thought this was due to a CF bug that was reported and subsequently fixed in the HotFix 4. However my problem still remained after the issue. (Before the issue there were several getRealPathFromConn errors in the CF Error logs, however since the update those errors are gone, but the application pool stopping remains.) I have done extensive Googling/Discussion on the issue in various forums (Adobe/Stack) and now I am trying here.

What I've done thus far:

Due to this article: http://blogs.coldfusion.com/post.cfm/tuning-coldfusion-10-iis-connector-configuration I have since changed my connection pool timeout to 60 seconds in both the server.xml and the workers.properties.

Set Rapid Fail in IIS to 50.

Updated CF to latest version.

Question(s):

What is the best way to diagnose what is causing the issue? (I assume the server runs out of available connections, but how can I know for sure what is causing the errors to be sent to IIS.)

Can anyone point me in the right direction as to what items I might tune in the workers.properties, server.xml etc that I haven't tried?

How can I know for sure that the connector is actually causing the error in IIS? Per my discussion in this thread (http://forums.adobe.com/thread/1023068) (User Lee Bartelme) I've received help and the other user indicates that he things this is the issue. Reading other threads here and elsewhere on the internet, it does indeed seem that the connection could be the case, but could there be something else.

I thought about possibly creating a connection for just the application that is crashing but I don't think it will help that much as the site on the failing application pool gets most of our traffic. Even other applications reference resources from the main site so essentially almost any page request requests resources from this main site.

(Side note, when a user requests a page and that page pulls three resources from domain.com/assets, lets pretend three stylesheets, doesn't that actually use three connections, or just one? Or am I incorrect in my understanding of what a connection is?)

If you need any other information please let me know what to provide. All my files are the default set by Coldfusion except my workers.properties has worker.cfusion.connection_pool_timeout = 60 and my server.xml has <Connector port="8012" protocol="AJP/1.3" redirectPort="8445" tomcatAuthentication="false" connectionTimeout="60000" /> in place of <Connector port="8012" protocol="AJP/1.3" redirectPort="8445" tomcatAuthentication="false" />. (Added connection timeout.)

Also I am a developer not a server administrator, however I am tasked with keeping our server running so please be gentle on explaining things and assume I am ignorant as to terms/items/files not listed in my question.

Leeish
  • 151
  • 4
  • See this [Microsoft support page](http://support.microsoft.com/kb/919789) on how to use the Microsoft Debug Diagnostics tool for errors such as these. We had to use it once and was pretty impressive on pin pointing our issue. (Not related to this.) It will show you what is happening when the service fails. – Miguel-F Mar 14 '13 at 11:59
  • Thanks... I'll have to try this again. I tried setting up rules before but couldn't ever get it to dump anything. I probably haven't set up my rule right or whatever. It's not overly intuitive to do. Thanks. – Leeish Mar 14 '13 at 14:38

0 Answers0