0

Hi on an EC2 server I'm using the following to gzip SQL from another server:

mysqldump -h $HOST -u $UNAME -p$PWORD --single-transaction $1 | gzip -5 > $2_`date +%d`.sql.gz

At the moment the SQL data is 560 Mb and here is information from "free":

             total       used       free     shared    buffers     cached
Mem:       2049568    1731356     318212        360     144328     529472
-/+ buffers/cache:    1057556     992012
Swap:            0          0          0

I was wondering how it would work if I had 1 Gb or 2 Gb of SQL data? Does it do gzip while it is receiving the data to minimize RAM usage? Or does it get the whole SQL data first then gzip it?

Luke Wenke
  • 103
  • 4
  • [Read this](https://serverfault.com/questions/449296/why-is-linux-reporting-free-memory-strangely) about free memory. Pretty sure gzip can be streamed, because web servers do it, so it could use relatively little RAM if required. Try it and see, if it runs out of RAM you might need to tweak the command you use. – Tim May 18 '17 at 01:49

1 Answers1

2

Other than the dump on the remote system, this command can use surprising little memory. Mysqldump can page the data files into memory as needed. Indexes are unlikely to be used, so not all the blocks in the data files need to be read. Besides the extra I/O to read blocks in from disk, there may be additional I/O to replace them in buffers. As this is happening on another system, the local impact is only a small amount of memory for network data buffers, and the small amount mysqldump needs to construct the output.

mysqldump -h $HOST -u $UNAME -p$PWORD --single-transaction $1 

On Linux/Unix platforms the pipe will only use enough memory to block buffer the data to gzip. gzip will buffer some data to enable it to optimize the compression. Higher compression values may buffer more data, and will require more CPU. The data from mysqldump is highly compressible as there are long repeated strings in the output.

| gzip -5

The redirection will again block buffer data before writing it to disk. Much of this will use memory recorded as buffers. Modern file systems may keep several seconds worth of data in memory, before flushing the data to disk.

> $2_`date +%d`.sql.gz

Data buffers will remain in the pool after they have been written. These blocks will be quickly reclaimed if the system has a need for free memory. I've used systems where it would take several days to fill memory to the point where data buffers needed to be reclaimed. While it appears you are using slightly over half the available memory excluding buffers and cache, the buffers and cache can contribute significantly to performance.

If this was happening all on one server, this might force block from mysql files out of the buffer pool. These would have beed served from memory rather than disk when a read was requested. This might slow down the mysql performance until the blocks are replace. However, on the remote system actively read blocks may be forced out as the data is read into memory.

BillThor
  • 27,354
  • 3
  • 35
  • 69
  • So I assume it would still work ok to backup 2 Gb of SQL from a remote MySQL server while being a moderately busy web server with 2 Gb total memory – Luke Wenke May 18 '17 at 08:28
  • @LukeWenke My main concern for such a load would be network and CPU load. CPU can can be dealt with using `nice` at the expense of a possibly longer run. The longer may impact the database server as the export is transactional. I don't have a ready way to deal with network load. Monitoring with a tool like `munin` or `atsar` could provide insight into any issues. – BillThor May 18 '17 at 22:39
  • currently the backup takes 25-45 seconds... I chose a gzip setting that was a compromise between time taken and compression. I was wondering if some changes were made to the database while it was doing the backup would it include the changes because then the data might be inconsistent. There are hourly backups and currently there doesn't seem to be a CPU/network load problem with the webserver – Luke Wenke May 19 '17 at 01:49
  • 1
    @LukeWenke If the backups are that fast, having a transactional backup should not cause issues. It should also solve the risk of an inconsistent backup. – BillThor May 19 '17 at 03:22