11

I am regularly saving and later restoring a smallish PostgreSQL database, which is used for testing. Its data is updated regularly as a result of tests, then a new dump must be made, and the dumps are regularly used to recreate the database in a well-defined state.

I noted that the dump (using pg_dump -Fc database) only takes a few seconds, but the restore (pg_restore -d database) takes about a minute. This seems weird. I would have expected both to take about the same time (assuming both tasks are I/O-bound).

Is there some problem with the restore? Could I maybe make it faster? Or is it normal for restore to take much longer than dump? (And if yes, then why?)

The dump file usually has about 3-4 MiB; the DBMS is PostgreSQL V8.4, running on a Pentium4 3GHz with 1GiB RAM under Ubuntu Linux.

sleske
  • 9,851
  • 4
  • 33
  • 44

2 Answers2

11

The content of an index is not part of the backup, only the definition of the index. And that will only take a few bytes. When the index is created during restore and all data is indexed, it will be much bigger. This will take time, but it depends on your situation how much time.

pg_restore does have an option for concurrent restore (as of version 8.4), use --jobs=number-of-jobs

Frank Heikens
  • 1,208
  • 6
  • 8
  • Interesting, thanks. Is there a way to dump the index as well, to speed up the restore (at the cost of a larger dump file)? – sleske Mar 02 '11 at 12:34
  • No, the content of the index can't be part of the backup. For a very small database like yours (3-4 MiB), it shouldn't be a problem anyway. – Frank Heikens Mar 02 '11 at 12:43
  • Additional information: pg_dump doesn't have access to the content of an index. pg_dump uses SELECT statements to get all content of the tables and content of the system tables to create the backup. It's "just" a wrapper around some SELECT statements and some functions to write the results to disk. – Frank Heikens Mar 02 '11 at 12:58
  • @Frank: Thanks. Didn't know about pg_dump's implementation. In our case, speeding up the restore would be helpful, because it needs to run repeatedly as part of automated tests, so bringing it down from 1 minute to say 10s would help. But apparently that's not feasible. I'll have to find a different solution... – sleske Mar 02 '11 at 13:34
  • A 3 to 4 MiB backup file should be restored in a few seconds, not minutes. Even when using older hardware. Did you check http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server for the configuration? – Frank Heikens Mar 02 '11 at 13:54
  • 2
    @sleske you might try with the [filesystem backup approach](http://www.postgresql.org/docs/8.4/static/backup-file.html). This should preserve indexes and, in addition, probably run a bit faster for both backup and restore – Stefano May 27 '11 at 12:41
  • @Stefano: Interesting option; I didn't know this was possible. I don't think this is feasible for us, though, as has quite a few constraints (need to shutdown DBMS, need to dump/restore *all* databases). I fear this is only feasible if you have a PostgreSQL installation all to yourself, which I cannot assume. – sleske May 27 '11 at 14:27
  • @sleske yeap, I understand, on a 24/7 production server that definitely is not a good option. Only good to dump the FULL cluster too, which might not be handy if you only needed one db. We do use it ourselves from time to time, setting up a "maintenance" page for a couple of minutes, when we really want to replicate our full base for deeper tests on a dev machine... – Stefano May 27 '11 at 15:40
  • @Stefano: Actually, production issues are not a problem, as I need the dump/restore for automated Q/A tests. However, the Q/A environment should be easy to set up (because ideally each dev should have his/her own, in addition to several company-wide environments), and requiring a dedicated DBMS installation with all kinds of backup scripts is something I would like to avoid. – sleske May 27 '11 at 16:13
4

For a restore, the database has to do a lot of additional work:

Some things come to mind immediately:

  • Writing is slower than reading
  • Parsing the input takes time
  • Updating indexes and other internal structures
  • Maintaining referential integrity

Not sure if this amounts to that time difference, though.

Sven
  • 97,248
  • 13
  • 177
  • 225