0

Stuck at "A start job is running for Create Volatile Files and Directories" after reboot a server(Debian 9.5, 64bit), and solve by this "boot-stuck-at-a-start-job-is-running-for-create-volatile-files-and-directories".

I can't figure out what is the root cause of this issue, although search from many questions which are not refer the root cause but just the varied solutions that not meet me.

We have not reach the limit of file or (sub) directory, and set the dir_nlink for ext4.

# sudo tune2fs -l /dev/debian-vg/root | grep dir_nlink
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent
 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum

And the are more than 50% capacity of inode and disk.

The original /tmp directory only little file and directory, total disk space usage only 1G.

Some info:

$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.9.0-7-amd64 root=/dev/mapper/debian--vg-root ro net.ifnames=0 biosdevname=0 console0=tty0 console=ttyS0,115200n8 quiet

$ mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=4077900k,nr_inodes=1019475,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=817924k,mode=755)
/dev/mapper/debian--vg-root on / type ext4 (rw,relatime,errors=remount-ro,data=ordered)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=36,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=9039)
mqueue on /dev/mqueue type mqueue (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=817920k,mode=700,uid=1000,gid=1000)

$ lsblk
NAME                MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda                 254:0    0 1000G  0 disk 
└─vda1              254:1    0 1000G  0 part 
  └─debian--vg-root 253:0    0    3T  0 lvm  /
vdb                 254:16   0    4T  0 disk 
vdc                 254:32   0    2T  0 disk 
└─debian--vg-root   253:0    0    3T  0 lvm  /

$ blkid
/dev/vda1: UUID="ijfyeQ-***" TYPE="LVM2_member" PARTUUID="d6***"
/dev/mapper/debian--vg-root: UUID="2d2294a9-***" TYPE="ext4"
/dev/vdc: UUID="PXrGC9-***" TYPE="LVM2_member"

$ sudo find /tmp/ | wc -l
28905144
VictorLee
  • 13
  • 7
  • What kernel options do you use to boot? I think root cause is the RO mode of root fs with missing `/tmp` directory as `tmpfs`. In this case the `/tmp` is just a directory of `/` partition, that is in read only. – Anton Danilov Apr 23 '22 at 14:52
  • Without special options, all default. Read Only mode? How could I check this: `RO mode of root fs with missing /tmp directory as tmpfs`? – VictorLee Apr 23 '22 at 14:57
  • 1
    show output of commands: `cat /proc/cmdline` and `mount`, `lsblk` and `blkid` – Anton Danilov Apr 23 '22 at 15:47
  • 1
    1GB can be any number of tiny, even zero-byte files. The answer you linked to referred to millions of tiny files. `find /tmp/ | wc -l` to see how many files are in your /tmp . Each file takes some number of ms to delete. Rule that out at least. 50% inode usage is somewhat high on a typical machine. If it's millions, next is finding out what's writing them. Either a bug or something a cron or logrotate script can take care of. – Bill McGonigle Apr 23 '22 at 19:34
  • @AntonDanilov and Bill McGonigle, had update the command result append to question. – VictorLee Apr 24 '22 at 08:52
  • Can you access the host whilst the job is still stuck? – Matthew Ife Apr 26 '22 at 15:10
  • @MatthewIfe no, just access the host through single mode. – VictorLee Apr 27 '22 at 01:30

2 Answers2

0

As you are showing with your sudo find /tmp/ | wc -l command, you indeed have close to 30 million entries in /tmp. You could start with a fresh /tmp directory as pointed out in other answers, and you probably should, but as you have guessed, unless you get to the bottom of this, you'll end up in the same situation.

Unfortunately there could be all kinds of reasons for this issue. For example, one issue I have personally experienced is atd going crazy and starting to create empty files in /tmp in a crazy loop (talking thousands per second or something to that extent). I am not saying this is your case as at is not a popular tool these days, but you'll have to look at the filenames in /tmp and try to guess where they came from based on their names, and maybe timestamps.

Try sudo find /tmp -ls | more and look for any clues. It will hopefully be obvious.

chutz
  • 7,569
  • 1
  • 28
  • 57
0

There are two causes of your situation at least:

  • 1, 28905144 the result of find /tmp/ | wc -l shows that you have tons of file in /tmp directory. Obviously, /tmp directory wasn't cleared out normally at boot or at shutdown.
  • 2, / directory was setting to a large value which capacity reached 3T. With more space, HDD(I guess that isn't SSD) addressing will slower.

Advice:

  • 1, check files which under the /tmp directory whether be created normally or not, and you will figure out the reason.
  • 2, make the / directory no more than 2T, if possible, or use high-performance media such as SSD(NVMe).
catalpa
  • 101