22

How many web servers does StackOverflow/ServerFault have?

If the answer is 'more than one', then is does it achieve Session Stickiness while DNS polling?

  • Not really, but if it was phrased differently it could make an interesting question. –  Jun 26 '09 at 19:49
  • You should rephrase the question. Change the title to "How is session stickiness achieved across multiple web servers?" or something like that... – William Brendel Jun 26 '09 at 19:51
  • could you do me a favor to show me the right phrase? –  Jun 26 '09 at 19:52
  • 1
    The assumption that having multiple servers implies sticky sessions -- which are an abomination -- pains me. – womble Jun 27 '09 at 07:55

6 Answers6

43

Large websites may be "load balanced" across multiple machines. In many load balanced setups, a user may hit any of the backend machines during a session. Because of this, several methods exist to allow many machines to share user sessions.

The method chosen will depend on the style of load balancing employed, as well as the availability/capacity of backend storage:

Session information stored in cookies only: Session information (not just a session identifier) is stored in a user's cookie. For example, the user's cookie might contain the contents of their shopping basket. To prevent users from tampering with the session data, an HMAC may be provided along with the cookie. This method is probably least suitable for most applications:

  • No backend storage is required
  • The user does not need to hit the same machine each time, so DNS load balancing can be employed
  • There is no latency associated with retrieving the session information from a database machine (as it is provided with the HTTP request). Useful if your site is load-balanced by machines on different continents.
  • The amount of data that can be stored in the session is limited (by the 4K cookie size limit)
  • Encryption has to be employed if a user should not be able to see the contents of their session
  • HMAC (or similar) has to be employed to prevent user tampering of session data
  • Since the session data is not stored server-side, it's more difficult for developers to debug

Load balancer always directs the user to the same machine: Many load balancers may set their session cookie, indicating which backend machine a user is making requests from, and direct them to that machine in the future. Because the user is always directed to the same machine, session sharing between multiple machines is not required. This may be good in some situations:

  • An existing application's session handling may not need to be changed to become multiple machines aware
  • No shared database system (or similar) is required for storing sessions, possibly increasing reliability, but at the cost of complexity
  • A backend machine going down will take down any user sessions started on it, with it.
  • Taking machines out of service is more difficult. Users with sessions on a machine to be taken down for maintenance should be allowed to complete their tasks before the machine is turned off. To support this, web load balancers may have a feature to "drain" requests to a certain backend machine.

Shared backend database or key/value store: Session information is stored in a backend database, which all of the web servers have access to query and update. The user's browser stores a cookie containing an identifier (such as the session ID), pointing to the session information. This is probably the cleanest method of the three:

  • The user never needs to be exposed to the stored session information.
  • The user does not need to hit the same machine each time, so DNS load balancing can be employed
  • One disadvantage is the bottleneck that can be placed on whichever backend storage system is employed.
  • Session information may be expired and backed up consistently.

Overall, most dynamic web applications perform several database queries or key/value store requests, so the database or key/value store is the logical storage location of session data.

Tommeh
  • 556
  • 4
  • 6
  • 2
    +1 Fairly comprehensive answer and saves me writing it. :) As far as db storage goes, a relational database is probably the wrong thing. Something like one of the persistent memcached forks is better. memcachedb might be suitable. You also missed off replicating session information between servers. It's not the best method, but things like tomcat do it, so worth documenting. – David Pashley Jun 26 '09 at 21:45
  • Which approch is utilized by Google, Twitter or Facebook? – Dannyboy Sep 15 '14 at 12:56
  • 1
    Not sure about Google, Twitter or Facebook, but Redis is a great fit for a session store. Its basically the "persistent memcached" David Pashley was recommending in 2009, when Redis was embryonic. – B Robster Jan 17 '15 at 17:00
4

If your question is how to maintain sessions across multiple front-end web servers, then the answer is usually to use a centralized database. Instead of relying on the web server instances to track session files on the local file systems, you'd write the session ids and data into a central DB, and all the web servers would retrieve the data from there instead.

  • +1 for mentioning centalized database. Just to expand/simplify on that idea a little. If you set a cookie on a user's PC with something unique such as a global user ID you can then store that GUID in a database. It won't matter what server a client connects to, as long as they have the GUID/cookie you'll be able to look them up against the database and track the session accordingly. – KPWINC Jun 26 '09 at 21:27
  • 2
    Storing sessions in a relational database is always a bad idea. You shouldn't use databases for storing transient data. – David Pashley Jun 26 '09 at 21:41
1

Using nemcached seems to be a good solution not as mentionned by @David Pashley

It means having a remote memcached instance shared by all servers and using the memcache PECL extension that provides its own session handler.

It only requires to change two parameters in the php configuration!

Here is a good tutorial http://www.dotdeb.org/2008/08/25/storing-your-php-sessions-using-memcached/

Tristan
  • 143
  • 1
  • 5
0

IIRC, in DotNetRocks #440 they said one server period. Don't know if that is still the case.

Edit: Actually it was Hanselminutes #134. Sorry.

Donald Byrd
  • 538
  • 5
  • 12
0

You can set a cookie.

You can calculate a hash of the remote IP (at its simplest, odd numbered remote hosts go to server A, even numbered hosts go to server B).

Looks like you can also do it via some values that stay with the source system if you're using an ssl tunnel.

Typically each of the above mechanisms requires a "reverse proxy" server or a load balancer of some sort. That load balancer accept the traffic and then direct it to whichever server initially had the session, based on one of the above criteria.

I'm not sure, though, what you mean by "DNS polling"

chris
  • 11,784
  • 6
  • 41
  • 51
0

a)You can store session information in user cookie . See stateless hardened cookies, which stores none data at server side, but preserves session state http://www.cl.cam.ac.uk/~sjm217/papers/protocols08cookies.pdf . b)You can change session backend storage to database or memcached. To eliminate single point of failure, you can set database replication or multiple memcached nodes. Note, that memcached is recommended in such setups where losing user state in session is not big error and does not make him very unhappy. For cases where preserving state is vital, use databases. Both PHP, Django and Rails allows developer to write custom session backend.

Kristaps
  • 2,925
  • 16
  • 22