1

I have a Sun x4540 that was recently converted to NexentaStor Enterprise Edition from an earlier OpenSolaris. The system disks were wiped and the zpools were exported and reimported. I'm hosting roughly 30 virtual machines to VMWare vSphere over NFS and 10GbE to several hosts.

Since that time, the system has been crashing almost every two weeks. The crash would trigger an ASR with the ILOm and the system would reboot on its own after 5-10 minutes. I have core files in the root directory as such:

-rw-------   1 root root 2237608178 Mar 14 21:06 core
-rw-------   1 root root   81061304 Feb  8 01:23 core.mountd.1297149806
-rw-------   1 root root   69863784 Mar  6 16:34 core.mountd.1299450869
-rw-------   1 root root   36644272 Mar  6 16:39 core.mountd.1299451179

How can I debug these with mdb to understand what is happening? I saw a brief tutorial linked to the Nexenta site at: http://kristof.willen.be/node/1100, but it doesn't seem to apply directly.

ewwhite
  • 194,921
  • 91
  • 434
  • 799

2 Answers2

2

The key here was to change to the /var/crash/myhost directory and run the savecore -f vmdump.x command on the crash dump file. That process creates a unix.x file that can be used by the mdb -k unix.x vmcore.x command, where "x" is the core dump number.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
1

Do you know if the kernel itself is crashing? It might be worth to enable kernel dumps too.

Check the URL below for instructions on how to analyze the core files and how to enable kernel crash dumps.

http://developers.sun.com/solaris/articles/manage_core_dump.html

# mkdir -p /var/crash/`uname -n`
# dumpadm -y -s /var/crash/`uname -n`
# dumpadm 
      Dump content: kernel pages
       Dump device: /dev/zvol/dsk/rpool/dump (dedicated)
Savecore directory: /var/crash/$hostname
  Savecore enabled: yes

# mkdir -p /var/core/`uname -n`
# coreadm -g /var/core/`uname -n`/core.%n.%f.%p -G all -e log -e global
# coreadm 
     global core file pattern: /var/core/$hostname/core.%n.%f.%p
     global core file content: all
       init core file pattern: core
       init core file content: default
            global core dumps: enabled
       per-process core dumps: enabled
      global setid core dumps: disabled
 per-process setid core dumps: disabled
     global core dump logging: enabled
Giovanni Tirloni
  • 5,693
  • 3
  • 24
  • 49
  • This was a kernel crash. I was missing the `savecore -f vmdump.0` step, which placed debuggable kernel core dumps into the `/var/crash/myhost` directory. From there, I needed to run `mdb -k unix.0 vmcore.0`. – ewwhite Mar 16 '11 at 12:55