1

I'm using CouchDB to serve thousands of requests per second. When under heavy load, it seems to respond slowly so I began to ran tests with apache bench. Couch can respond 50k requests, 1k concurrent. I then raised the concurrency to 2k, but the benchmark is always broken at around 8k requests with the message:

apr_socket_recv: Connection reset by peer (104)

In the CouchDb log I can find these two errors:

[Sat, 21 Nov 2015 17:16:07 GMT] [error] [<0.8073.2>] {error_report,<0.31.0>,
                      {<0.8073.2>,crash_report,
                       [[{initial_call,
                          {mochiweb_acceptor,init,
                           ['Argument__1','Argument__2','Argument__3']}},
                         {pid,<0.8073.2>},
                         {registered_name,[]},
                         {error_info,
                          {exit,
                           {error,accept_failed},
                           [{mochiweb_acceptor,init,3,
                             [{file,"mochiweb_acceptor.erl"},{line,34}]},
                            {proc_lib,init_p_do_apply,3,
                             [{file,"proc_lib.erl"},{line,239}]}]}},
                         {ancestors,
                          [couch_httpd,couch_secondary_services,
                           couch_server_sup,<0.32.0>]},
                         {messages,[]},
                         {links,[<0.105.0>]},
                         {dictionary,[]},
                         {trap_exit,false},
                         {status,running},
                         {heap_size,233},
                         {stack_size,27},
                         {reductions,330}],
                        []]}}

// and this:

[Sat, 21 Nov 2015 17:11:54 GMT] [error] [<0.105.0>] {error_report,<0.31.0>,
                        {<0.105.0>,std_error,
                         {mochiweb_socket_server,297,
                             {acceptor_error,{error,accept_failed}}}}}

Sadly, I don't understand what they mean.

What I've done so far trying to increment resources for CouchDB:

  • Raised the file descriptors limit to 250k, both hard and soft
  • Raised "System resource limits" as described here:
    • export ERL_MAX_PORTS=8192 <- although this is deprecated I think
    • export ERL_MAX_ETS_TABLES=6000
    • export ERL_FLAGS="+Q 350000 +P 750000 +A 100"
  • Raised almost all values in CouchDB configuration
  • I've also read something about ports being in TIME_WAIT, but after a benchmark it appears to be only 280 ports in that state

And nothing worked.

For these tests I'm using a VM with:

  • Ubuntu 14.04.2
  • CouchDB 1.5.0
  • Erlang R16B03 (erts-5.10.4)
Parziphal
  • 121
  • 6
  • What OS are you on? I'm wondering if you're running out of local ports, if you've got a port limit of 32k. – Jenny D Nov 21 '15 at 17:59
  • @JennyD I edited the question. – Parziphal Nov 21 '15 at 18:04
  • the error comes from mochiweb what is a Erlang library for handling HTTP. The error occures here: https://github.com/mochi/mochiweb/blob/master/src/mochiweb_acceptor.erl#L34 There is a very old bug here: https://issues.apache.org/jira/browse/COUCHDB-536 . Please look into the info given up from "Clement Law added a comment - 16/Dec/14 08:15" – awenkhh Dec 01 '15 at 09:52

1 Answers1

1

CouchDb's Kxepal linked me to this email where it says:

One common gotcha with limits.conf is that couchdb su's to the couchdb user and /etc/pam.d/su in debian/ubuntu defaults to not respecting limits.conf, you need to enable it.

So I did vim /etc/pam.d/su and found:

# Sets up user limits, please uncomment and read /etc/security/limits.conf
# to enable this functionality.
# (Replaces the use of /etc/limits in old login)
#session    required   pam_limits.so

I uncommented the last line, I restarted the VM, and CouchDB now supports as many concurrent requests as I wanted. It was ignoring the limits configuration all along.

I also learned/realized that what actually requires a lot of file descriptors is the benchmark itself, not the service to be benchmarked. Maybe I should benchmark from a different VM.

Parziphal
  • 121
  • 6
  • Thanks, this was part of the solution for me. As per http://wiki.apache.org/couchdb/Performance, for 10k connections, I also had to edit /etc/security/limits.d/100-couchdb.conf and insert the lines `* hard nofile 10000` and `* soft nofile 10000`. Note: Specifying `*` for the domain was essential as using `couchdb` for the domain did not work. – redgeoff Oct 31 '16 at 13:36
  • Yes, those are the file descriptors, I mention in the question that I raised them. So we could say that the answer assumes that file descriptors were already modified. Nice to know this could help someone. – Parziphal Oct 31 '16 at 17:57