Hanging XFS filesystem on encrypted USB device

1

I have an XFS filesystem mounted from LUKS encrypted partition on a USB device on Arch Linux.

It works fine sometimes, but occasionally hangs (or gets incredibly slow?) for minutes at a time when writing to the device. But it eventually recovers, often after I terminate the write process. What it says in dmesg is this:

[579742.480204] XFS (dm-3): Mounting V5 Filesystem
[579742.571959] XFS (dm-3): Ending clean mount
[579925.430501] INFO: task xfsaild/dm-3:15682 blocked for more than 120 seconds.
[579925.430508]       Tainted: G        W  OE     4.19.41-1-lts #1
[579925.430510] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[579925.430512] xfsaild/dm-3    D    0 15682      2 0x80000080
[579925.430516] Call Trace:
[579925.430526]  ? __schedule+0x29b/0x860
[579925.430530]  schedule+0x28/0x80
[579925.430589]  xfs_log_force+0x163/0x2d0 [xfs]
[579925.430595]  ? wake_up_q+0x70/0x70
[579925.430648]  xfsaild+0x1ac/0x7b0 [xfs]
[579925.430703]  ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs]
[579925.430707]  kthread+0x112/0x130
[579925.430710]  ? kthread_park+0x80/0x80
[579925.430713]  ret_from_fork+0x35/0x40

And journald says

Jul 10 15:59:27 <username> kernel: INFO: task xfsaild/dm-3:15682 blocked for more than 120 seconds.
Jul 10 15:59:27 <username> kernel:       Tainted: G        W  OE     4.19.41-1-lts #1
Jul 10 15:59:27 <username> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 10 15:59:27 <username> kernel: xfsaild/dm-3    D    0 15682      2 0x80000080
Jul 10 15:59:27 <username> kernel: Call Trace:
Jul 10 15:59:27 <username> kernel:  ? __schedule+0x29b/0x860
Jul 10 15:59:27 <username> kernel:  schedule+0x28/0x80
Jul 10 15:59:27 <username> kernel:  xfs_log_force+0x163/0x2d0 [xfs]
Jul 10 15:59:27 <username> kernel:  ? wake_up_q+0x70/0x70
Jul 10 15:59:27 <username> kernel:  xfsaild+0x1ac/0x7b0 [xfs]
Jul 10 15:59:27 <username> kernel:  ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs]
Jul 10 15:59:27 <username> kernel:  kthread+0x112/0x130
Jul 10 15:59:27 <username> kernel:  ? kthread_park+0x80/0x80
Jul 10 15:59:27 <username> kernel:  ret_from_fork+0x35/0x40
Jul 10 16:00:54 <username> sudo[17962]:  <username> : TTY=pts/12 ; PWD=/home/<username> ; USER=root ; COMMAND=/usr/bin/kill -SIGKILL 17936
Jul 10 16:00:54 <username> sudo[17962]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jul 10 16:00:54 <username> sudo[17962]: pam_unix(sudo:session): session closed for user root
Jul 10 16:01:42 <username> systemd[1]: mnt-backupd.mount: Succeeded.
Jul 10 16:01:42 <username> systemd[523]: mnt-backupd.mount: Succeeded.
Jul 10 16:03:33 <username> kernel: INFO: task xfsaild/dm-3:15682 blocked for more than 120 seconds.
Jul 10 16:03:33 <username> kernel:       Tainted: G        W  OE     4.19.41-1-lts #1
Jul 10 16:03:33 <username> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 10 16:03:33 <username> kernel: xfsaild/dm-3    D    0 15682      2 0x80000080
Jul 10 16:03:33 <username> kernel: Call Trace:
Jul 10 16:03:33 <username> kernel:  ? __schedule+0x29b/0x860
Jul 10 16:03:33 <username> kernel:  schedule+0x28/0x80
Jul 10 16:03:33 <username> kernel:  xfs_log_force+0x163/0x2d0 [xfs]
Jul 10 16:03:33 <username> kernel:  ? wake_up_q+0x70/0x70
Jul 10 16:03:33 <username> kernel:  xfsaild+0x1ac/0x7b0 [xfs]
Jul 10 16:03:33 <username> kernel:  ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs]
Jul 10 16:03:33 <username> kernel:  kthread+0x112/0x130
Jul 10 16:03:33 <username> kernel:  ? kthread_park+0x80/0x80
Jul 10 16:03:33 <username> kernel:  ret_from_fork+0x35/0x40
Jul 10 16:03:55 <username> kernel: XFS (dm-3): Unmounting Filesystem

I am wondering what that means and where this comes from. Or, alternatively, how I can investigate the problem further.

I assume that even though the setup is a bit unconventional (USB -> LUKS -> XFS), this should not lead to such effects. Can I find out if it is a problem with the hardware? Or if it is on the software side - how can I make this go away.

0range

Posted 2019-07-10T14:15:57.550

Reputation: 629

Answers

0

I also ran into this, and after spending a lot of time on it, also using Btrfs, and using the program blktrace to compare what requests were happening at the block level for the filesystem, the LUKS volume, and my physical drive, I came to the conclusion it's an XFS bug. I've gone back to Btrfs so really won't be proceeding further with this, but I emailed the XFS mailing list, so you might be able to report your problem there if you're still encountering it. I was on a Samsung 970 Evo 1TB NVMe, also using LUKS, and I was within a QEMU virtual machine. See https://www.spinics.net/lists/linux-xfs/msg31927.html

user1902689

Posted 2019-07-10T14:15:57.550

Reputation: 152