6

I'm trying to design an architecture to handle massive TCP socket connections and I'm in doubt on the limits of this architecture.

I'll have to handle ~20k concurrent TCP connections and these are long-polling connections, they'll remain connected for long periods of time and will be sending data every minute.

Using threads is out of question, since 20k threads will starve the system resources. I'm planning to use gevent to handle such amount of concurrent connections or even use haproxy and 2 servers (w/ gevent) to handle 10k connections each for instance. Does that makes sense ? Does anyone have some advice or had some experience using gevent with 10K+ connections ? Does anyone have idea of the hardware requirements to handle these connections ? I saw some benchmarks that shows lots of connection timeouts for gevent on 5k concurrent connections, what is not very promising for my problem.

Note: I've already read about the C10k problem and the advices on the "Million-user Comet Application".

Aldebaran
  • 335
  • 3
  • 12

2 Answers2

4

Use both at once in combination. Assign something on the order of 1,000 connections to a process. Use a manager to distribute connections and spawn new processes if you reach a point where every current handling process is saturated.

Jeff Ferland
  • 20,239
  • 2
  • 61
  • 85
3

Well, since you can use 2 servers to handle 10k each, why not use 5 servers and handle 4k on each?

coredump
  • 12,573
  • 2
  • 34
  • 53
  • 2
    What makes you think that if I can have 2 servers I can also have more than the double of this amount lol ? – Aldebaran Jul 02 '12 at 23:11
  • 2
    because the hurdle of parallelising a task is between 1 and many. If it is a case that you can't afford more than 2 servers then perhaps your effort:value ratio is a bit skewed? – JamesRyan Jul 02 '12 at 23:21
  • 1
    @JamesRyan besides my utter amusement at pointing out that having two doesn't mean getting four, "Throw more hardware at it," isn't always the best answer, and sometimes it's a horrific answer. There's no reason a well-designed app can't handle keeping open 10k connections with modern processors and memory. – Jeff Ferland Jul 03 '12 at 03:03
  • 1
    sure except he seems to want to handle them in python. Throw more hardware is often a good idea if you pay for development time because the annual cost of an extra server is the same as only a couple of days. Particularly when you can get 90% of the optimisation done in the first 10% of the time, there is a point where struggling to get that extra little bit is not worth it. – JamesRyan Jul 03 '12 at 10:45