Remote copy of LMDB

0

I want to migrate an LMDB from my local machine to another remote machine, but there is some weirdness about the file size. According to the filesystem, an LMDB is a directory containing two files: data.mdb and lock.mdb.

The output of ls -altoh lmdb indicates that data.mdb has a file size of 4T, which matches the map_size parameter I used to create the LMDB. All this means is that when the DB is opened, the OS will memory map the file, giving it 4T of virtual space. The output of du -hs lmdb indicates that the lmdb is taking up ~900MB of disk, which agrees with the map_size reported by python -mlmdb -e lmdb stat.

When I do a local copy cp -r lmdb lmdb_copy, it works as expected: 900MB of data is copied. The same when I do scp -r lmdb lmdb_copy2 (using scp to do a local copy).

However, when I do a remote copy scp -r lmdb user@remotehost:~/lmdb_copy, scp attempts to copy 4T of data, as indicated by the progress bar. I stopped the scp after 2GB of data has been transfered.

On the remote machine, 'ls and du both 2GB as the size of the LMDB. python -mlmdb -e lmdb_copy stat reports the correct size of 900MB and that all of the entries are there. I've verified that I can print out all of the keys and they are correct.

With this background, my question is, why does scp attempt to copy all 4T of the memory map size? Ideally, I'd like to let scp do its thing in the background without having to manually kill it.

waldol1

Posted 2015-09-09T17:07:24.007

Reputation: 101

Answers

1

You could try using rsync to do the copy. It says it deals with sparse files. Something like

rsync --rsh=ssh --archive --sparse lmdb user@remotehost:~/lmdb_copy

As an aside, and some insight into why scp works locally but not over a network, when scp sees that it's a local to local copy it just passes the request to the cp command directly. Monitoring an scp command's system calls, I caught it doing this

execve("/bin/sh", ["sh", "-c", "exec cp -r foo bah"], [/* 20 vars */])

mykel

Posted 2015-09-09T17:07:24.007

Reputation: 196

Thanks, I'll try that. I found that the mdb_copy function will locally copy the lmdb so that the file isn't sparse (ls shows the correct file size), so that the scp will work as intended. – waldol1 – 2015-09-10T15:50:09.827

Hmm, it worked better, but still not what I want. A bit more than 900MB (971MB) of data got transfered (as shown by ls/du on remote machine), but rsync was still running (and reporting ridiculous transfer rates of 1000GB/s), even though the file size on the remote machine stopped increasing. – waldol1 – 2015-09-18T14:53:06.050

rsync reported transfer rates attempt to show you the net rate, so would go high for sparse files (just as it would for "no change" sections). Did you let rsync finish? – mykel – 2015-09-20T12:37:07.883