0

I run a SmartOS system on a Hetzner EX4S (Intel Core i7-2600, 32G RAM, 2x3Tb SATA HDD). There are six virtual machines on the host:

[root@10-bf-48-7f-e7-03 ~]# vmadm list
UUID                                  TYPE  RAM      STATE             ALIAS
d2223467-bbe5-4b81-a9d1-439e9a66d43f  KVM   512      running           xxxx1
5f36358f-68fa-4351-b66f-830484b9a6ee  KVM   1024     running           xxxx2
d570e9ac-9eac-4e4f-8fda-2b1d721c8358  OS    1024     running           xxxx3
ef88979e-fb7f-460c-bf56-905755e0a399  KVM   1024     running           xxxx4
d8e06def-c9c9-4d17-b975-47dd4836f962  KVM   4096     running           xxxx5
4b06fe88-db6e-4cf3-aadd-e1006ada7188  KVM   9216     running           xxxx5
[root@10-bf-48-7f-e7-03 ~]#

The host reboots several times a week with no crash dump in /var/crash and no messages in the /var/adm/messages log. Basically /var/adm/messages looks like there was a hard reset:

2012-11-23T08:54:43.210625+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T09:14:43.187589+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T09:34:43.165100+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T09:54:43.142065+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:14:43.119365+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:34:43.096351+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:54:43.073821+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:57:55.610954+00:00 10-bf-48-7f-e7-03 genunix: [ID 540533 kern.notice] #015SunOS Release 5.11 Version joyent_20121018T224723Z 64-bit
2012-11-23T10:57:55.610962+00:00 10-bf-48-7f-e7-03 genunix: [ID 299592 kern.notice] Copyright (c) 2010-2012, Joyent Inc. All rights reserved.
2012-11-23T10:57:55.610967+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: lgpg
2012-11-23T10:57:55.610971+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: tsc
2012-11-23T10:57:55.610974+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: msr
2012-11-23T10:57:55.610978+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: mtrr
2012-11-23T10:57:55.610981+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: pge
2012-11-23T10:57:55.610984+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: de
2012-11-23T10:57:55.610987+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: cmov
2012-11-23T10:57:55.610995+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: mmx
2012-11-23T10:57:55.611000+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: mca
2012-11-23T10:57:55.611004+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: pae
2012-11-23T10:57:55.611008+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: cv8

The problem is that sometimes the host loses the network interface on reboot so we need to perform a manual hardware reset to bring it back. We do not have physical or virtual access to the server console - no KVM, no iLO or anything like this. So, the only way to debug is to analyze crash dumps/log files. I am not a SmartOS/Solaris expert so I am not sure how to proceed. Is there any equivalent of Linux netconsole for SmartOS? Can I just redirect the console output to the network port somehow? Maybe I am missing something obvious and crash information is located somewhere else.

Shane Madden
  • 112,982
  • 12
  • 174
  • 248
Alex
  • 7,789
  • 4
  • 36
  • 51

1 Answers1

1

Run the command dumpadm to check crash dumps are enabled, and on what device.

If it is enabled and you find no crash dumps, then suspect a hardware fault and ask your hosting company to move you to a different physical server. (They will also be able to check hardware logs and fault lights and call the vendor and so on.)

ramruma
  • 2,730
  • 1
  • 14
  • 8