1

Under linux 2.6 (CentOS 5.3 actually), I have a file system that works okay for a few minutes, then drops to read-only mode, while I am trying to read/write data to it.

It is an ext3 file system referencing a luks (cryptsetup) device via device-mapper. The luks raw device is actually a loopback device and that loopback device references a file. The file references a file on a remote system mapped via sshfs. Yeah, lots of containers. This is what I'm doing to store a secure backup to a cloud-like environment that is a little less secure than I prefer.

In detail:

  1. Remote system has a folder backupFolder contained a preallocated 20G file backupFile.
  2. Local system mounts the backupFolder using sshfs to a local mount point backupMapped.
  3. Local system creates a loopback device (using losetup) pointing to backupMapped/backupFile.
  4. Local system applies a luks device mapping to the loopback device using cryptsetup luksOpen, yielding /dev/mapper/backup.
  5. Local system mounts /dev/mapper/backup as an ext3 device on a different local mount point, say /root/rpg/d01

Then I can use rsync to update the data on cleartextBackup/ with my original data on the local system.

This works well when testing with small differences, small rsyncs. However for larger operations, after about fifteen minutes of apparently successful operations, the rsync hangs for quite awhile, then carries on, generating various error messages the rest of the way through the source files:

rsync: mkstemp "/root/rpg/d01/db_pgexports/dbdump.sql" failed: Read-only file system (30)

and

rsync: cannot stat destination "/root/rpg/d01/db_myexports/": Input/output error (5) rsync error: errors selecting input/output files, dirs (code 3) at main.c(493) [receiver=2.6.8]

and

rsync: recv_generator: mkdir "/root/rpg/d01/vault/jjm" failed: Read-only file system (30) rsync: stat "/root/wpg/d01/vault/jjm" failed: No such file or directory (2)

I see various errors in dmesg:

EXT3 FS on dm-12, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Buffer I/O error on device dm-12, logical block 119212
lost page write due to I/O error on dm-12
Buffer I/O error on device dm-12, logical block 119213
lost page write due to I/O error on dm-12
Buffer I/O error on device dm-12, logical block 119214
lost page write due to I/O error on dm-12
<snip>
Buffer I/O error on device dm-12, logical block 119221
lost page write due to I/O error on dm-12
Aborting journal on device dm-12.
ext3_abort called.
EXT3-fs error (device dm-12): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
EXT3-fs error (device dm-12): ext3_get_inode_loc: unable to read inode block - inode=2480641, block=4980738
EXT3-fs error (device dm-12): ext3_get_inode_loc: unable to read inode block - inode=32641, block=65538
EXT3-fs error (device dm-12): ext3_find_entry: reading directory #65281 offset 0
EXT3-fs error (device dm-12): ext3_find_entry: reading directory #65281 offset 0
EXT3-fs error (device dm-12): ext3_find_entry: reading directory #65281 offset 0
<snip>

and basically the same messages in /var/log/messages.

Thinking this had something to do with timeouts (despite the fact that it should be read-write intensive), I experimented a bit with ssh configuration parameters such as

serveraliveinterval 20

serveralivecountmax 4000

and using sshfs -o reconnect, even -o workaround=all.

All to no avail.

Can anyone perhaps shed some light on what's breaking and where I can make this thing reliably stay in read-write mode?

Thanks, Steve La Rocque

slarocque
  • 11
  • 3
  • The first thing I would want to do is reduce the complexity of this setup to find the breakage. Low hanging fruit in that regard is sshfs. rsync already speaks the ssh protocol, so don't mount anything with sshfs just do a local to remote rsync. rsync -azp /somedir/ user@1.2.3.3:/some/other/dir/ – foocorpluser May 03 '12 at 16:35
  • The remotely-held data needs to be kept securely and should never have the decryption key nor the plaintext, so rsync over the net won't do that for me; one side is plaintext, the other is also plaintext. I could switch to nfs in place of sshfs perhaps. – slarocque May 03 '12 at 17:44
  • Okay, but we are just debugging right now, does it work? – foocorpluser May 03 '12 at 18:36
  • also rsync over ssh uses the ssh protocol, communication and authentication will be as secure and encrypted as sshfs – foocorpluser May 03 '12 at 18:39
  • why not just encrypt /dev/mapper/backup on the local machine and rsync it already encrypted to you cloud storage. then the keys can stay on the local box, and the remote storage can be encrypted. – foocorpluser May 03 '12 at 18:56
  • I did try this without sshfs locally and the problem went away. In fact, I tried it locally on a lan between machines and the problem went away. Only when I try to use the internet to connect to the real cloud do I run into these problems it seems. I am guessing that network resets are causing sshfs to get confused and this precipitates the errors. But sshfs should be pretty robust, being that its TCP and I'm using the -o reconnect to try to get sshfs to "Transparently" handle minor connectivity glitches. – slarocque May 08 '12 at 19:40

0 Answers0