27

A postgres SELECT query ran out of control on our DB server and started eating up tons of memory and swap until the server ran out of memory. I found the particular process via ps aux | grep postgres and ran kill -9 pid. This killed the process and the memory freed up as expected. The rest of the system and postgres queries appeared to be unaffected. This server is running postgres 9.1.3 on SLES 9 SP4.

However, one of our developers chewed me out for killing a postgres process with kill -9, saying that it will take down the entire postgres service. In reality, it did not. I've done this before a handful of times and have not seen any negative side effects.

With that said, and after further reading, it looks like kill pid without the flags is the preferred way to kill a runaway postgres process, but per other users in the postgres community, it also sounds like postgres has "gotten better" over the years such that kill -9 on an individual query process/thread is no longer a death sentence.

Can someone enlighten me on the proper way to kill a runaway postgres process as well as the how disastrous (or benign) using kill -9 is with Postgres these days? Thanks for the insight.

Banjer
  • 3,854
  • 11
  • 40
  • 47

3 Answers3

35

voretaq7's answer covers the key points, including the correct way to terminate backends but I'd like to add a little more explanation.

kill -9 (ie SIGKILL) should never, ever, ever be your first-choice default. It should be your last resort when the process doesn't respond to its normal shutdown requests and a SIGTERM (kill -15) has had no effect. That's true of Pg and pretty much everything else.

kill -9 gives the killed process no chance to do any cleanup at all.

When it comes to PostgreSQL, Pg sees a backed that's terminated by kill -9 as a backed crash. It knows the backend might have corrupted shared memory - because you could've interrupted it half way through writing a page into shm or modifying one, for example - so it terminates and restarts all the other backends when it notices that a backend has suddenly vanished and exited with a non-zero error code.

You'll see this reported in the logs.

If it appears to do no harm, that because Pg is restarting everything after the crash and your application is recovering from the lost connections cleanly. That doesn't make it a good idea. If nothing else backend crashes are less well tested than the normal-functioning parts of Pg and are much more complicated/varied, so the chances of a bug lurking in backend crash handling and recovery are higher.

BTW, if you kill -9 the postmaster then remove postmaster.pid and start it again without making sure every postgres backend is gone, very bad things can happen. This could easily happen if you accidentally killed the postmaster instead of a backend, saw the database had gone down, tried to restart it, removed the "stale" .pid file when the restart failed, and tried to restart it again. That's one of the reasons you should avoid waving kill -9 around Pg, and shouldn't delete postmaster.pid.

A demonstration:

To see exactly what happens when you kill -9 a backend, try these simple steps. Open two terminals, open psql in each, and in each run SELECT pg_backend_pid();. In another terminal kill -9 one of the PIDs. Now run SELECT pg_backend_pid(); in both psql sessions again. Notice how they both lost their connections?

Session 1, which we killed:

$ psql regress
psql (9.1.4)
Type "help" for help.

regress=# select pg_backend_pid();
 pg_backend_pid 
----------------
           6357
(1 row)

[kill -9 of session one happens at this point]

regress=# select pg_backend_pid();
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
regress=# select pg_backend_pid();
 pg_backend_pid 
----------------
           6463
(1 row)

Session 2, which was collateral damage:

$ psql regress
psql (9.1.4)
Type "help" for help.

regress=# select pg_backend_pid();
 pg_backend_pid 
----------------
           6283
(1 row)

[kill -9 of session one happens at this point]

regress=# select pg_backend_pid();
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
regress=# select pg_backend_pid();
 pg_backend_pid 
----------------
           6464
(1 row)

See how both sessions were broken? That's why you don't kill -9 a backend.

Craig Ringer
  • 10,553
  • 9
  • 38
  • 59
  • 1
    All very good answers here, and very humbling I might add. I could mark them all as accepted, but @Craig Ringer has some extra points here and really drives it homw. Thanks again SF for cleansing me of my bad habits! – Banjer Aug 09 '12 at 16:00
  • 2
    @Craig: What a superb response; and to include a demonstration, I wish I could up vote this 100x. I'm a software developer who works with PG daily and since the 6.x days and your response is spot on! Nice! – Kilo Aug 10 '12 at 00:46
  • 2
    Nice answer. An addendum: if you have a backend process that absolutely will not die--not with `pg_terminate_backend`, not with a server stack restart, not with anything, you can kill it however you want, but make sure you have a working backup of your database. You can do that a couple of ways: you can use `pg_basebackup` or similar (or just `rsync` and `pg_start\stop_backup`) to back up your data directory (test the backups before continuing!), or you can use `pg_dump[all]` to salvage your data. Only then should you consider `kill -9`, or a reboot, or whatever. – Zac B Sep 20 '12 at 21:12
  • 1
    @ZacB Yep, and if you kill it make sure *all* the backends die. Most vitally, *never* delete `postmaster.pid`. Ever. – Craig Ringer Sep 20 '12 at 22:56
30

I found the particular process via ps aux | grep postgres and ran kill -9 pid.
NO! BAD! STEP AWAY FROM THE BACKEND!

Seriously -- Don't kill Postgres backends like that -- TERRIBLE things can happen (even with all the stability enhancements that have been made since the 7.x days) which can trash your whole DB, and your developer is quite right to chew you out for doing this.

There is, in fact, a blessed and approved way of doing this from within Postgres -- It's even in the Postgres manual though that SO post does a better job of explaining it...

SELECT pg_cancel_backend(pid)
Sends a cancel (SIGINT) signal to the specified backend, which cancels the currently running query.

select pg_terminate_backend(pid)
Sends a terminate (SIGTERM) signal to the specified backend, which cancels the query and aborts the backend (dropping its connection).

Backend IDs can be obtained from the pg_stat_activity table (or ps)

voretaq7
  • 79,345
  • 17
  • 128
  • 213
  • 4
    In case anybody's wondering about the terrible things, given that `kill -9` is not dissimilar to suddenly powering the system off as far as the killed process is concerned: Pg is very tolerant of backend crashes (like a `kill -9`) and there should never be data corruption. There *will* be corruption if you kill the *postmaster*, remove postmaster.pid, and restart it without also killing every backend first. That *will* destroy your database, but takes lots more than just a `kill -9` to a backend. `kill -9` doesn't give the postmaster time to kill the backends, which is why it's dangerous. – Craig Ringer Aug 09 '12 at 23:45
  • 2
    ... like an emergency consulting case I had last week did. Badly corrupted their database, lost two days of work because their backups were failing (and they didn't auto-test their restores), were down for 48 hours. Don't delete `postmaster.pid`. – Craig Ringer Oct 22 '14 at 22:37
8

Killing a PostgreSQL client process should be fine. Killing a PostgreSQL daemon process might get you scolded.

Since SQL daemons have internal process controls as well, the preferred way is to try using that channel first.

See Stop (long) running SQL query in PostgreSQL... from StackOverflow.

Jeff Ferland
  • 20,239
  • 2
  • 61
  • 85
  • 4
    `kill -9` should never be your default choice anyway, it's a last resort. Send a `SIGTERM` with `kill -TERM` or plain `kill` and if the recipient doesn't respond after a while, only then should you consider `kill -KILL` (`kill -9`). – Craig Ringer Aug 09 '12 at 00:27