0

I had a similar question posted a while ago, and I Thought it fixed my issue, but that was unfortunately not the case.

I have a benchmark application which connects about 500 users to a Dovecot 2.2.5 mailserver using IMAP (plaintext auth, no SSL). After dovecot processes about 300~ users, the old connections start failing and I get errors on both the client- and server-side.

Here are some examples from server side:

Sep 23 19:05:52 imap-login: Info: Login: user=<test1>, method=PLAIN, rip=10.0.0.6, lip=10.0.0.2, mpid=1492, secured, session=<GqSHtRHnpQAKAAAG>
Sep 23 19:05:53 imap-login: Info: Login: user=<test2>, method=PLAIN, rip=10.0.0.6, lip=10.0.0.2, mpid=1494, secured, session=<K1OMtRHnpgAKAAAG>
Sep 23 19:05:53 imap-login: Info: Login: user=<test3>, method=PLAIN, rip=10.0.0.6, lip=10.0.0.2, mpid=1495, secured, session=<S/6QtRHnpwAKAAAG>
Sep 23 19:05:53 imap-login: Info: Login: user=<test4>, method=PLAIN, rip=10.0.0.6, lip=10.0.0.2, mpid=1496, secured, session=<37CVtRHnqAAKAAAG>
...
(Gets to around user=<test330>, then this:)
Sep 23 19:08:03 master: Error: service(imap): Initial status notification not received in 30 seconds, killing the process
Sep 23 19:08:04 imap: Fatal: master: service(imap): child 1840 killed with signal 9
Sep 23 19:08:04 imap(test211): Info: Connection closed: Connection reset by peer in=105 out=917

And then I see repeated logins from users < 300, as well as users up to 500.

On my client side, I'm barraged with the following messages:

2013-09-23 19:07:57:997 Warning: .doMyLongCommand received an SocketTimeoutException exception java.net.SocketTimeoutException: Read timed out
2013-09-23 19:07:57:997 ERROR: : Read timed out
2013-09-23 19:07:57:997 test211:Reconnecting user due to error condition during SELECT_INBOX

Here are some configuration options (using dovecot -a) that are related to serving a big number of clients simultaneously:

default_client_limit = 2003
default_idle_kill = 1 hours
default_process_limit = 1000
default_vsz_limit = 1024 M
mbox_dotlock_change_timeout = 5 mins
mbox_lock_timeout = 8 mins
service_count = 0 (in service imap-login {} )
mail_max_userip_connections = 1000

noproc and nofile are set very high (102400), so there shouldn't be any issues there.

I'm drawing a blank here. As far as I can think of, I think the root of the issue might be one of the following:

  1. Memory related, although 1GB should be more than enough.
  2. Configuration related, but I did not see any other settings which seemed to affect
  3. Network related. But then, should it be able to connect and go through those initial 300~ clients at all?

Any help would be greatly appreciated.

user991710
  • 193
  • 2
  • 9

1 Answers1

0

How is the CPU usage on the server? How about disk I/O? I suspect some system resource is overloaded. The service processes start too slowly and dovecot master deems them unresponsive.

This is the timeout you are running up against: http://hg.dovecot.org/dovecot-2.2/file/b9498573f0d0/src/master/service.h#l7

EDIT It could also be a really strict user process count limit in ulimit() or some SELinux feature (in e.g. CentOS) acting up.

@user991710 Maybe you could explain in a bit more detail which commands your test client issues to the server and what kind of client it is.

thuovila
  • 121
  • 3