25

I have a PostgreSQL database with a multi-Gb table (which contains a log of certain events). I need to pass the latest events to an analyst - let's say he only needs events from the last month.

How can I produce a dump of only those rows that have, say, created_at > '2012-05-01'?

Leonid Shevtsov
  • 481
  • 1
  • 5
  • 8

4 Answers4

23

psql -c "COPY (SELECT * FROM my_table WHERE created_at > '2012-05-01') TO STDOUT;" source_db | psql -c "COPY my_table FROM STDIN;" target_db

gilad905
  • 339
  • 2
  • 6
20

Another way is to use COPY or \copy (the psql command), something like:

COPY (SELECT * FROM big_table WHERE created_at > '2012-05-01') TO '/path/to/a/dump/file';
Milen A. Radev
  • 942
  • 5
  • 17
  • 7
    But does it produce actual INSERT statements? More generally speaking, how do you get the dump back into the database? – Leonid Shevtsov May 28 '12 at 13:13
  • No, it doesn't produce valid SQL statments - its output is CSV, "text" or binary format. But it could be imported in another DB using the same commands. – Milen A. Radev May 28 '12 at 13:34
  • 1
    @LeonidShevtsov from [this SO answer](http://stackoverflow.com/a/1746215/1098603) you can `COPY big_table FROM '/path/to/a/dump/file`, though you might have to have the same version on both instances. – Matthieu Sep 03 '15 at 07:21
  • For alternatives (psql client-side included), have a look at https://stackoverflow.com/questions/1517635/save-pl-pgsql-output-from-postgresql-to-a-csv-file – PEdroArthur Aug 16 '18 at 03:00
10

DISCLAIMER: verbatim from https://stackoverflow.com/questions/1517635/save-pl-pgsql-output-from-postgresql-to-a-csv-file

Do you want the resulting file on the server, or on the client?

Server side

If you want something easy to re-use or automate, you can use Postgresql's built in COPY command. e.g.

Copy (Select * From foo) To '/tmp/test.csv' With CSV DELIMITER ',';

This approach runs entirely on the remote server - it can't write to your local PC. It also needs to be run as a Postgres "superuser" (normally called "root") because Postgres can't stop it doing nasty things with that machine's local filesystem.

That doesn't actually mean you have to be connected as a superuser (automating that would be a security risk of a different kind), because you can use the SECURITY DEFINER option to CREATE FUNCTION to make a function which runs as though you were a superuser.

The crucial part is that your function is there to perform additional checks, not just by-pass the security - so you could write a function which exports the exact data you need, or you could write something which can accept various options as long as they meet a strict whitelist. You need to check two things:

  1. Which files should the user be allowed to read/write on disk? This might be a particular directory, for instance, and the filename might have to have a suitable prefix or extension.
  2. Which tables should the user be able to read/write in the database? This would normally be defined by GRANTs in the database, but the function is now running as a superuser, so tables which would normally be "out of bounds" will be fully accessible. You probably don’t want to let someone invoke your function and add rows on the end of your “users” table…

I've written a blog post expanding on this approach, including some examples of functions that export (or import) files and tables meeting strict conditions.


Client side

The other approach is to do the file handling on the client side, i.e. in your application or script. The Postgres server doesn't need to know what file you're copying to, it just spits out the data and the client puts it somewhere.

The underlying syntax for this is the COPY TO STDOUT command, and graphical tools like pgAdmin will wrap it for you in a nice dialog.

The psql command-line client has a special "meta-command" called \copy, which takes all the same options as the "real" COPY, but is run inside the client:

\copy (Select * From foo) To '/tmp/test.csv' With CSV

Note that there is no terminating ;, because meta-commands are terminated by newline, unlike SQL commands.

From the docs:

Do not confuse COPY with the psql instruction \copy. \copy invokes COPY FROM STDIN or COPY TO STDOUT, and then fetches/stores the data in a file accessible to the psql client. Thus, file accessibility and access rights depend on the client rather than the server when \copy is used.

Your application programming language may also have support for pushing or fetching the data, but you cannot generally use COPY FROM STDIN/TO STDOUT within a standard SQL statement, because there is no way of connecting the input/output stream. PHP's PostgreSQL handler (not PDO) includes very basic pg_copy_from and pg_copy_to functions which copy to/from a PHP array, which may not be efficient for large data sets.

PEdroArthur
  • 326
  • 2
  • 3
4

If the PSQL user doesn't have the permission to write to a file then you can do something like this.

psql -c "COPY (SELECT * FROM big_table WHERE created_at > '2012-05-01') TO STDOUT;" -h localhost -d my_database -U my_user > path/to/file