3

I’m running a Debian Squeeze AMD64 server. Target runlevel after boot is runlevel 2, which includes rsyslogd, cron, sshd and some other stuff, but not dovecot, postfix, apache2, etc. The system fails to reach runlevel 2 with several symptoms:

  • The system hangs at trying to start rsyslogd
  • Booting into runlevel 1 works, then login from the console works
  • Starting rsyslogd from runlevel 1 via /etc/init.d/rsyslog hangs
  • Starting runlevel 2 with rsyslogd disabled works
  • But then, logging in via console fails: I get the motd, and then nothing
  • Starting sshd from runlevel 1 succeeds
  • But then, I cannot login via ssh. Sometimes password ssh login gives me the motd and then nothing, sometimes not even this. Trying to offer a public key seems to annoy the sshd enough to not talk to me any further.
  • When rebooting from runlevel 1, the server hangs at trying to stop apache2 (which is not running, so this really should be trivial). Trying to stop apache2 when logged in in runleve 1 does hang as well.

And that’s just the stuff which fails all the time. RAM has been tested, dmesg shows no problems. I have no clue.

Update: (shortened) output from rsyslogd -c4 -d called in runlevel 1

rsyslogd 4.6.4 startup, compatibility mode 4, module path '' caller requested object 'net', not found (iRet -3003) Requested to load module 'lmnet' loading module '/user/lib/rsyslog/lmnet.so' module of type 2 being loaded conf.c requested ref for 'lmnet', refcount 1 rsylog runtime initialized, version 4.6.4, current users 1 syslogd.c requested ref for 'lmnet', refcount now 2

I can kill rsyslogd with Strg+C, then. /var/log shows none of the configured log files, though.

Update2: Thanks to @DerfK I still have no clue, but at least I narrowed down the problem. I’m now testing with /etc/init.d/apache2 stop (without an apache2 running, of course) which hangs as well and looks like an even more obvious failure.

After some testing I found out that a file with one single line:

/usr/sbin/apache2ctl configtest > /dev/null 2>&1

hangs, while the same line executed in an interactive shell works. I was not able to further reduce this line while, i. e. every single part, the stream redirections and the commando itself is necessary to reproduce the hang. @DerfK also pointed me to strace which gave a shallow hint about what kind of hang we have here:

  • wait4(-1for the init scripts
  • futex(0xsomepointer, FUTEX_WAIT_PRIVATE, 2, NULL for rsyslogd / apache2 binaries called by the init scripts

The system was installed as a Debian Lenny by my hoster in autumn 2011, I upgraded it to Squeeze immediately and kept it up to date with Squeeze, which then used to be testing. There were no big changes, though. I guess I never tried to reboot the system before.

Update3: I found the problem. My /etc/nsswitch.conf specified ldap as hosts lookup backup, which is not available at that time of the boot. Relying on dns solely fixes my boot problems.

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
Adrian Heine
  • 328
  • 4
  • 22
  • Since you're at the console when using runlevel 1... When it "hangs" can you ctrl-C to interrupt it? Does the system entirely freeze and the cursor no longer moves when you type? – DerfK Mar 09 '11 at 19:59
  • What happens if, instead of running the init script, you run `rsyslogd -c4 -d`? – sciurus Mar 09 '11 at 20:01
  • @DerfK Ctrl+C doesn’t work, nor Strg+Alt+Entf. The session seem to hang, but the system is still running – sshd still »works«. – Adrian Heine Mar 09 '11 at 20:55
  • @sciurus I added the output. Seems to run, or at least run way more than /etc/init.d/rsyslog start. – Adrian Heine Mar 09 '11 at 21:40
  • @Adrian interesting. Do you have screen, and can you get it to run and open a couple of shells on the console? It'd be interesting to see what `strace -p pidofrsyslogd` says it's doing when it's stuck, as well as what `ps aux` says rsyslogd's status is while it's stuck. Have you looked at `dmesg` to see if the kernel is issuing any interesting errors during this? – DerfK Mar 10 '11 at 03:17
  • Does the hostname appear in /etc/hosts? – Mark Wagner Mar 09 '11 at 20:38
  • Yes, there is a correct entry in hosts: xxx.xxx.xxx.xxx domain.net host.domain.net host – Adrian Heine Mar 09 '11 at 21:31
  • @DerfK tmux :) `strace` outputs `futex(0xsomepointer, FUTEX_WAIT_PRIVATE, 2, NULL` for rsyslogd and `wait4(-1, ` for the rsyslog init script. `ps aux` says both sleep. dmesg did not say anything since bootup. – Adrian Heine Mar 10 '11 at 06:27
  • @DerfK /etc/init.d/apache2 stop gives the same output in strace. – Adrian Heine Mar 10 '11 at 06:45
  • @Adrian The FUTEX_WAIT_PRIVATE hang makes this a pthreads issue, many of the hits on google implicate nvidia, others blame whatever is handy. Was this a fresh squeeze install or an upgrade, have you installed anything else? (Better yet: has this ever worked on this machine?) Do you have gdb installed there? while it's hung, run `gdb` then `attach pidofrsyslogd` and `backtrace full` (update question, it won't fit in a comment). Also, for giggles, run `strace -f rsyslogdinitscript start` the output isn't important, see if it works when it's being traced (interferes with threading). – DerfK Mar 10 '11 at 14:00
  • @DerfK I updated the question. I gdb’ed the init script before, but the back trace obviously lacked all symbols and I am not really sure which dbg package is missing. Running the script directly using strace gives the same hang, I tried this earlier as well. Thanks for your help so far, btw :) – Adrian Heine Mar 12 '11 at 07:57
  • @Adrian you can work out the -dbg packages you need with `ldd /usr/sbin/rsyslogd` (or wherever it is...) it will tell you the libraries it's using, and where it is finding those libraries ( libfoo => /lib/libfoo.so.1 ) use `dpkg -S /lib/libfoo.so.1` to get the package name, then add -dbg. – DerfK Mar 12 '11 at 14:42
  • @DerfK Thank you very much for helping me through this. Updated with solution :) – Adrian Heine Mar 14 '11 at 08:03

2 Answers2

1

This sounds to me like some basic network service isn't being started. Compare the contents of /etc/rc2.d with /etc/rc3.d ro see if runlevel 3 starts anything runlevel 2 doesn't (normally it does, but usually it's not something fundamental).

geekosaur
  • 7,025
  • 1
  • 19
  • 19
  • Surely rc3 starts stuff which is not included in rc2, since I stripped down rc2 to a minimum by removing dovecot, postfix, apache2. Which kind of basic network service do you think of and why should it be started in 2 or 3 and not in S? After all, network connection works: I get the motd via ssh, just nothing more. – Adrian Heine Mar 09 '11 at 19:13
0

Debian Squeeze does concurrent startup by default. This means multiple init scripts are running at the same time on boot. You can try disabling this so that only one script runs at a time to help find out exactly which step it is failing on. Since the init scripts will run in the same order every time it should fail on the same one everytime unless it is a much more serious problem.

To disable concurrent booting add CONCURRENCY=none to /etc/default/rcS. Remove the line to restore the default.

Arrowmaster
  • 511
  • 2
  • 5