Many programs such as sshd create .pid files in /var/run/ that contain their process ID. Are these files reliable for determining whether a process is running? My guess is that these files are created manually by a process, and therefore will still remain in the file system if the program crashes.
6 Answers
in simple terms, no: a process (e.g. a daemon) can crash and not have the time to clear its .pid file.
A technique to be more certain of the state of a program: use an explicit communication channel such as a socket. Write the socket port in a file and have the supervisor
process look it up.
You can also use the services of DBus on Linux: register a specific name and have your supervisor process (whatever you call it) check for that name.
There are numerous techniques.
One thing to remember: it is not the OS' responsibility to manage the PID files.
- 1,779
- 4
- 23
- 27
-
1The existence of the pid file, COMBINED with the existence of the process however, should be sufficient. If the process quit, you can check that. PIDs do get reused, but not very often. – MarkR Feb 22 '10 at 21:47
-
2how often a pid gets reused depends upon the particular system in question. I've seen a system were PIDs cycled at least daily. You have to check the pid, that there's a process, and that the process appears to be the one you expect to own the pid. – Feb 22 '10 at 21:59
-
@atk: exactly. There isn't a standard per-se and even if there was one, it can very well be not respected by some implementations. E.g. I can craft a daemon that doesn't write a PID file at all and use a back channel to get its management commands from. – jldupont Feb 22 '10 at 22:05
-
@atk: unfortunately, there's no way to ensure that the PID does not get reused between time of check and time of use ... – SamB Mar 23 '15 at 03:25
Jldupont is correct in stating that .pid files are not reliable for determining whether a process is running as the file may not be removed in the event of a crash.
Race conditions aside, I often use pgrep when I need to know if a process is running. I could then cross-reference the output against the .pid file(s) if I felt it necessary.
- 131
- 3
A file containing a process id is not reliable do determine if a process is running or not. It is just a reliable source, to figure out the last given process id for the process.
When you have the process id, you have to do futher checking, if the process is realy running.
Here is an example:
#!/usr/bin/env sh
file="/var/run/sshd.pid"
processid=$(cat /var/run/sshd.pid)
if [ ! -f ${file} ]; then
echo "File does not exists: ${file}"
exit 1
fi
if [ ! -r ${file} ]; then
echo "Insufficient file persmissons: ${file}"
exit 1
fi
psoutput=$(ps -p ${processid} -o comm=)
if [ $? == 0 ];then
if [ ${psoutput} == "sshd" ]; then
echo "sshd process is realy running with process id ${processid}"
exit 0
else
echo "given process id ${processid} is not sshd: ${psoutput}"
exit 1
fi
else
echo "there is no process runing with process id ${processid}"
exit 0
fi
pgrep is a nice command, but you'll get in trouble, when you have multiple instances running. For example when you have a regular sshd running on port TCP/22 and you have another sshd running on port TCP/2222, then pgrep will deliver two process ids when searching for sshd... when the normal sshd have its pid in /var/run/sshd.pid and the other could have its pid in /var/run/sshd-other.pid you can clearly differentiate the processes.
I do not recommend using just ps, piping through one or multiple pipes with grep and grep -v trying to filter out all other stuff which does not interest you... it a bit like using
find . | grep myfile
to figure out, if a file exits.
- 131
- 1
It is not reliable to simply check the existence of a process with the same pid as contained in the file.
But many pidfile implementations also do locking on the pidfile, so that if the process dies, the lock goes away. Provided the locking mechanism is reliable, checking to see if the file is still locked is a relatively reliable mechanism for determining whether the original process is still running.
- 161
- 4
Jldupont is correct.
You can, however, send the process a 0 signal (kill -s 0 pid) to see if the process is still alive (assuming you have the authority to send such a signal -- in general, only the owner of a process may send it a signal).
-
4But checking for the existence of a process with that PID doesn't mean that it's the PID you're interested it. – janm Feb 23 '10 at 00:37
I agree with jschmier.
On some systems, you do not get access to pgrep. In such a case, you can do ps -aef | grep <pid>
to find out if the process is really running.
- 131
- 2
-
1The key point in the question was "reliable". Doing a ps and looking for a PID is not reliable. – janm Feb 23 '10 at 00:38
-
well... assuming that you know that name of the program, why do you think ps -aef | grep is unreliable? – user29584 Feb 23 '10 at 01:13
-
3Race conditions: the state of the system has changed by the time ps has finished. Process titles: another process could have a similar title to the one you're interested in. Multiple instances: Consider a system with two instances of the same service, each with a PID file. One fails, and the other restarts and gets the PID of the first service. How do you tell? Etc. Not reliable, impossible to get right because of race conditions, and there are reliable techniques that just work. For a reliable alternative see, for example, http://cr.yp.to/daemontools.html – janm Feb 23 '10 at 01:26