9

I have configured cron to invoke pg_dump on a daily basis using following rule:

# xyz database backups:
00 01 * * * root umask 077 && pg_dump --user=xyz_system xyz | gzip > /var/xyz/backup/db/xyz/`date -u +\%Y\%m\%dT\%H\%M\%S`.gz

Basically, it works. The database grows relatively fast and exponentially (however the exponent isn't very big). Currently the gzipped dump takes about 160MB. When the database is dumped the system starts to crawl. The load average I saw using the top command was about 200, 200, 180. Basically the server is hardly responsive.

The first question is how to determine where the bottleneck is. Is the poor performance caused by heavy I/O operations? Is it caused by table locking issues? Maybe it is a memory issue? The output of the pg_dump command is piped to the gzip command. Is it sequential, i.e. entire dump is placed in the memory (swapping problem?) and then compressed or concurrent (i.e. gzip compresses what it gets and waits for more)? May it be caused by some other factor?

The second question is how to make the dumping operation less intrusive for main functions of the system. As far as I understand things, the dump can't take too much time because of database integrity. There are table write locks, etc. What can I make to limit the problems (or delay it, considering database growth).

The third question: Is it already time to learn about more advanced database configurations? The system works ok, when database backups are not performed, but maybe the db dumping issue is a first symptom of incoming problems?

Dariusz Walczak
  • 195
  • 1
  • 1
  • 5

2 Answers2

16

Wow. Amazing number of questions. I will try to address some, but this answer is not complete yet.

how to determine where the bottleneck is.

Use top first to see what's going on during the dump. Inspect process CPU usage, process status. D means "waiting for I/O".

Is the poor performance caused by heavy I/O operations?

Yes, most probably.

Is it caused by table locking issues?

Maybe. you could use pg_stat_activity system view to see what's going on in postgres during the dump.

Maybe it is a memory issue?

Very unlikely.

The output of the pg_dump command is piped to the gzip command. Is it sequential, i.e. entire dump is placed in the memory (swapping problem?)

No. gzip is a block compressor working in stream mode, it does not keep all input in memory.

and then compressed or concurrent (i.e. gzip compresses what it gets and waits for more)?

Yes, it compresses block by block, outputs and waits for more.

May it be caused by some other factor?

Yes.

As far as I understand things, the dump can't take too much time because of database integrity. There are table write locks, etc. What can I make to limit the problems (or delay it, considering database growth).

The dump duration has no effect on dump integrity. Integrity is ensured by using one transaction with repeatable read isolation level by all pg_dump process. There are NO table write locks for regular INSERT/UPDATE. Only some DDL operations (DROP TABLE, ALTER TABLE etc) can be blocked during backup.

Is it already time to learn about more advanced database configurations? The system works ok, when database backups are not performed, but maybe the db dumping issue is a first symptom of incoming problems?

Never too late. Start with http://wiki.postgresql.org/wiki/Performance_Optimization.

filiprem
  • 341
  • 1
  • 6
  • FWIW, I had issues with `pg_dump` 100% CPU and it was from gzip. Specifying `pg_dump --compress=0` solved it for me on Ubuntu 16.04. Backups were super fast after that too. Watch out for gzip compression in containers; might not do what you expect. – Ligemer Jul 31 '18 at 23:54
6

I recommend you to look at continuous archiving of postgresql. Here are the advantages over using pg_dump:

  1. No need to do a full backup each time. One full backup is enough at the beginning, but it is recommended to have a full backup every several days for example.
  2. Very much faster to restore when the DB grows in size.
  3. The ability to restore to some other point (Point-In-Time Recovery).
  4. You will be doing incremental backup every hour (30 min or so). This can be configured and depends also on update activity.

However, there are some drawbacks (which might not be a problem in most cases):

  1. Usually more space is required because these are binary backups. The DB folder can be compressed.
  2. You can not restore them on a different architecture (binary data).
Khaled
  • 35,688
  • 8
  • 69
  • 98