I have configured cron to invoke pg_dump on a daily basis using following rule:
# xyz database backups:
00 01 * * * root umask 077 && pg_dump --user=xyz_system xyz | gzip > /var/xyz/backup/db/xyz/`date -u +\%Y\%m\%dT\%H\%M\%S`.gz
Basically, it works. The database grows relatively fast and exponentially (however the exponent isn't very big). Currently the gzipped dump takes about 160MB. When the database is dumped the system starts to crawl. The load average I saw using the top
command was about 200, 200, 180
. Basically the server is hardly responsive.
The first question is how to determine where the bottleneck is. Is the poor performance caused by heavy I/O operations? Is it caused by table locking issues? Maybe it is a memory issue? The output of the pg_dump
command is piped to the gzip
command. Is it sequential, i.e. entire dump is placed in the memory (swapping problem?) and then compressed or concurrent (i.e. gzip compresses what it gets and waits for more)? May it be caused by some other factor?
The second question is how to make the dumping operation less intrusive for main functions of the system. As far as I understand things, the dump can't take too much time because of database integrity. There are table write locks, etc. What can I make to limit the problems (or delay it, considering database growth).
The third question: Is it already time to learn about more advanced database configurations? The system works ok, when database backups are not performed, but maybe the db dumping issue is a first symptom of incoming problems?