There are quite a lot of "lower" details.
First, consider that the kernel has a list of processes, and at any given time, some of these processes are running, and some are not. The kernel allows each running process some slice of CPU time, then interrupts it and moves to the next. If there are no runnable processes, then the kernel will probably issue an instruction like HLT to the CPU which suspends the CPU until there is a hardware interrupt.
Somewhere in the server is a system call that says "give me something to do". There are two broad categories of ways this can be done. In the case of Apache, it calls accept
on a socket Apache has previously opened, probably listening on port 80. The kernel maintains a queue of connection attempts, and adds to that queue every time a TCP SYN is received. How the kernel knows a TCP SYN was received depends on the device driver; for many NICs there's probably a hardware interrupt when network data are received.
accept
asks the kernel to return to me the next connection initiation. If queue wasn't empty, then accept
just returns immediately. If the queue is empty, then the process (Apache) is removed from the list of running processes. When a connection is later initiated, the process is resumed. This is called "blocking", because to the process calling it, accept()
looks like a function that doesn't return until it has a result, which might be some time from now. During that time the process can do nothing else.
Once accept
returns, Apache knows that someone is attempting to initiate a connection. It then calls fork to split the Apache process in two identical processes. One of these processes goes on to process the HTTP request, the other calls accept
again to get the next connection. Thus, there's always a master process which does nothing but call accept
and spawn sub-processes, and then there's one sub-process for each request.
This is a simplification: it's possible to do this with threads instead of processes, and it's also possible to fork
beforehand so there's a worker process ready to go when a request is received, thus reducing startup overhead. Depending on how Apache is configured it may do either of these things.
That's the first broad category of how to do it, and it's called blocking IO because the system calls like accept
and read
and write
which operate on sockets will suspend the process until they have something to return.
The other broad way to do it is called non-blocking or event based or asynchronous IO. This is implemented with system calls like select
or epoll
. These each do the same thing: you give them a list of sockets (or in general, file descriptors) and what you want to do with them, and the kernel blocks until its ready to do one of those things.
With this model, you might tell the kernel (with epoll
), "Tell me when there is a new connection on port 80 or new data to read on any of these 9471 other connections I have open". epoll
blocks until one of those things is ready, then you do it. Then you repeat. System calls like accept
and read
and write
never block, in part because whenever you call them, epoll
just told you that they are ready so there'd be no reason to block, and also because when you open the socket or the file you specify that you want them in non-blocking mode, so those calls will fail with EWOULDBLOCK
instead of blocking.
The advantage of this model is that you need only one process. This means you don't have to allocate a stack and kernel structures for each request. Nginx and HAProxy use this model, and it's a big reason they can deal with so many more connections than Apache on similar hardware.
1The keyword to understand is "server". In the server-client model (versus master-slave model) the server waits for requests from clients. These requests are events that need to be serviced. A webserver is an application program. Your question combines application SW with HW terminology (e.g. interrupt and NIC), rather than keep related concepts at the same abstraction layer. The NIC driver may actually use polling sometimes, e.g. Linux NAPI drivers regress to polling when there is a flood of packets. But that is irrelevant to the event-processing application SW. – sawdust – 2014-11-10T02:20:17.607
1@sawdust Very interesting. The question is really meant to understand the connection between the SW and HW processes – user2202911 – 2014-11-10T14:41:24.400
1It’s very similar to the way command-line (and other GUI) programs listen to the keyboard. Especially in a window system, where you have the step of the kernel receiving the data from the keyboard device and handing it off to the window manager, which identifies the window that has focus and gives the data to that window. – G-Man Says 'Reinstate Monica' – 2014-11-11T01:10:20.560
@G-Man: I theory, yes. In reality most typists don't type at 1 Gbit/s, which justifies having two different architectures. One clean, flexible and slow, one clumsy but high-speed. – MSalters – 2014-11-12T00:25:36.053