How to compare databases?

1

I tried using meld on two dumps from two different databases from the same RDBMS, but it seems that those dumps are not comparable because not being canonical.

I tried both pg_dump ... (standard) and pg_dump -a ... (only data) but both seem not being canonical, they are not so different, but the side to side difference shows a lot of false positives.

I searched for an applicable option in pg_dump without success, maybe a special tool is needed for this?

174140

Posted 2014-10-14T07:15:41.340

Reputation: 859

Every db platform I've worked with required an external utility for this purpose, and Postgre is probably no differant. check this out: http://www.devart.com/dbforge/postgresql/datacompare/

– Frank Thomas – 2014-10-14T07:27:59.367

Okay, but maybe for pg_dump maintainer adding an option for canonical output would be no big deal. If meld is OK, and it is, isn't it the natural approach trying to leverage it? – 174140 – 2014-10-14T07:50:11.000

Answers

1

The easiest solution may be rather manual:

  1. Dump the schema and table names.
  2. Loop over schema and table names, dumping each in turn (making sure to order by a set of columns giving a unique ordering) to the same file or a file per table.
  3. Repeat for the other database.

This should get you an easily comparable output.

l0b0

Posted 2014-10-14T07:15:41.340

Reputation: 6 306

I like it and at some point it's sure I'll give a try. However I'm not sure the schema will be comparable (they are mutant schemas of hundreds of tables and constraints), but I like very much the idea of passing an ORDER BY or something to the dump instruction. It would be needed also an additional step to abbreviate binary/base64/blob columns, but there should be alternatives for coding that, e.g. awk, sed, Python. – 174140 – 2014-10-15T18:52:09.977

1Just get the checksum of binary or other excessively long columns. At least then you can tell with certainty whether the values are different. – l0b0 – 2014-10-16T07:30:26.063

I don't know a way of sorting the rows other than with an external script that performed the row sort for each table (lines between lines matching "^COPY" and "^\.") and copied verbatim all the remaining information in a pg_dump text dump. – 174140 – 2014-10-31T10:02:13.517

You can do order by 1, 2, 3, [etc] to simply order by all the columns. – l0b0 – 2014-10-31T10:47:40.267

I was thinking of using only pg_dump... Yeah but having dirtrees support in the comparison tool, is OK dumping the tables' contents to single files using psql. But I will try not to use that approach, so that the binary checksum will be a script option, it will be possible to have a single file, and that single file would be restorable whenever binary columns would not have been checksummed. I'm trying to develop something asap... – 174140 – 2014-10-31T13:31:25.397

Favoring my approach of sorting the rows outside SQL am having results enough good for me, to date. Source.

– 174140 – 2014-11-05T14:44:15.013

Note to self: fix that broken link. :/ – 174140 – 2019-07-31T08:53:26.807

Done!!!!!!!!!!! – 174140 – 2019-08-16T19:27:07.007

Excellent! Would you mind setting this answer as accepted? – l0b0 – 2019-08-16T20:47:38.323

Can't do that because it seems I was forced to sort out COPY lines at the Shell level. I upvoted aeons ago though. – 174140 – 2019-08-16T20:57:46.343