5

I'm trying to debug a timeout problem I have with Apache, for some months now.

The pattern looks like this:

On every first request of a new session (or after some time after the last request) the browser asks instantly for credentials, then sends the request with basic auth. Then the server waits exactly 1 minute before sending the result.

enter image description here

Subsequent requests are answered instantly, this only happens for requests after some time (couldn't pinpoint it exactly yet, between 5 and 15 minutes).

The fact that the wait time is reproducible exactly 60 seconds smells like a timeout to me. If I cancel the request and hit reload I get the requested URL instantly.

Since the password prompt appears instantly as well I can rule out a problem with the SSL handshake between client and server, or DNS problems on that leg. It doesn't matter if I request a PHP script or a blank text file, which rules a problem with scripts on the server out as well. I guess the result of the auth process is then cached for a while, so it isn't necessary for subsequent requests.

Note that the authentication always succeeds, so I can rule out a "the domain controller didn't answer" problem as well.

Apache 2.4 is running on Windows Server 2012 R2. It is configured for LDAP auth:

<Location />
    AuthType Basic
    AuthName "AD Login"
    AuthBasicProvider ldap
    LDAPReferrals Off
    #AuthLDAPUrl ldap://dc01.domain.de:3268/dc=ad,dc=domain,dc=de?sAMAccountName?sub?(objectClass=*)
    #AuthLDAPUrl ldap://ad.domain.de:389/dc=ad,dc=domain,dc=de?sAMAccountName?sub?(objectClass=*) STARTTLS
    AuthLDAPUrl ldap://ad.domain.de:389/dc=ad,dc=domain,dc=de?sAMAccountName?sub?(objectClass=*) TLS
    AuthLDAPBindDN "service@domain.de"
    AuthLDAPBindPassword "secret"
    Require valid-user
    Require all denied
</Location>

As you can see I tried different connection types to the domain controllers, it doesn't really seem to matter which encryption method I use, or if I pass on encryption at all.

ad.domain.de resolves to multiple domain controllers, but the behavior is the same if I connect to a specific DC.

No entries in the error log on LogLevel info, I'm yet hesitant to increase it to debug, as I know from experience that I have trouble sifting through the generated debug information.

Is there anything I missed yet that I can use to debug the problem, or do I have to go through debug level logging?

Gerald Schneider
  • 19,757
  • 8
  • 52
  • 79
  • What does the logs from the ldap server say? time to reply? You seem to include all information except from the backend that is supposed to give the reply, the ldap server. Perhaps the value is not indexed or similar and causing those initial delays. It is important to check the time the search takes in the ldap. – ezra-s Jan 11 '17 at 10:28
  • The domain controllers doesn't log login attempts via LDAP at all. – Gerald Schneider Jan 11 '17 at 10:56
  • Then how can you be sure your problem is on the HTTPD end and not the ldap server itself? Have you tried to make the same requests defined in the httpd directly/manually to the ldap server? – ezra-s Jan 11 '17 at 12:11
  • Btw in 2.4 you can increase LogLevel only for a certain module, as in `LogLevel authldap:trace8`. Note the "authldap" identifier is an example, I can't recall the name for that module to invoke it in loglevel, shouldn't take you too long to find out. – ezra-s Jan 11 '17 at 12:18
  • I don't know if it happens on the HTTPd or on the LDAP server. I only know it is somewhere in the notification chain (it doesn't happen for resources without authentication). It doesn't happen on other servers which use the same authentication either, that's why I assume it's a problem on the server itself. Changing the LogLevel for the module only is a good tip, I'll try that. – Gerald Schneider Jan 11 '17 at 13:16

1 Answers1

3

After increasing the log level for the modules authnz_ldap and ldap the following error message appeared in the error log:

ldap_simple_bind() timed out on reused connection, dropped by firewall?

This led me to this mod_ldap bug report, which, while it turned out to be a configuration error, pointed me to the actual problem:

As reported elsewhere, windows closes an LDAP connection after 900 seconds, but the default Apache behavior appears to try to re-use the connection indefinitely. If Apache tries to re-use after windows has closed the connection, there's a 60 second delay waiting for the connection to timeout [...]

Doing a few quick checks to confirm this:

The default value of MaxConnIdleTime in the Microsoft LDAP Policies is 900 seconds, which matches with my observation that the problem appears again after 15 minutes. The 60 second delay also matches exactly with my problem.

According to that error report the issue should be resolved by setting LDAPConnectionPoolTTL to a value lower than MaxConnIdleTime and other than the default value -1, but that didn't work for me. I had to set the value to 0, disabling the reuse of existing connections.

LDAPConnectionPoolTTL 0

I don't expect any performance issues with this, since the ldap results are cached anyway.

The only thing that remains a mystery is why this issue only occurs on our Apache instance running on Windows, but not with the Apache instances running on Linux.

Gerald Schneider
  • 19,757
  • 8
  • 52
  • 79
  • `LDAPConnectionPoolTTL 0` worked for me. Exact same problem we had here. – Brain Foo Long Jun 06 '18 at 07:56
  • This also helped resolve a [nearly identical problem](https://serverfault.com/questions/935783/subversion-intermittently-hangs-stalls-on-commit/938241) that had to do with Subversion. – kmote Nov 01 '18 at 17:35