1

we currently have 3 out of 3 systems running CentOS in the exact same hardware and software configuration but are experiencing random system hangs. The occurrence can happen randomly as short as 20 minutes since boot or may take up to 1 or 2 weeks before it happens. We ran an independent live Ubuntu image and ran stress nonstop without any problems. We believe it might be a driver or software installed on our system but not sure how to determine what might be causing it.

How should we proceed if we want to determine what is causing our systems to hang?

  KERNEL: /lib/debug/lib/modules/3.10.0-1062.12.1.el7.x86_64/vmlinux
DUMPFILE: /var/crash/127.0.0.1-2020-08-28-19:02:49/vmcore  [PARTIAL DUMP]
    CPUS: 72
    DATE: Fri Aug 28 19:02:35 2020
  UPTIME: 6 days, 13:03:56 LOAD AVERAGE: 7.87, 7.35, 7.45
   TASKS: 5679
NODENAME: zagreb
 RELEASE: 3.10.0-1062.12.1.el7.x86_64
 VERSION: #1 SMP Tue Feb 4 23:02:59 UTC 2020
 MACHINE: x86_64  (3000 Mhz)
  MEMORY: 1023.4 GB
   PANIC: "BUG: unable to handle kernel NULL pointer dereference at           (null)"
     PID: 19718
 COMMAND: "9_scheduler"
    TASK: ffff8a8bc9ab1070  [THREAD_INFO: ffff8a8be0618000]
     CPU: 34
   STATE: TASK_RUNNING (PANIC)
crash>

Here is a log of the backtrace:

crash> bt
PID: 19718  TASK: ffff8a8bc9ab1070  CPU: 34  COMMAND: "9_scheduler"
 #0 [ffff8a8be061ba90] machine_kexec at ffffffff90665b34
 #1 [ffff8a8be061baf0] __crash_kexec at ffffffff90722352
 #2 [ffff8a8be061bbc0] crash_kexec at ffffffff90722440
 #3 [ffff8a8be061bbd8] oops_end at ffffffff90d85798
 #4 [ffff8a8be061bc00] no_context at ffffffff90675bb4
 #5 [ffff8a8be061bc50] __bad_area_nosemaphore at ffffffff90675e82
 #6 [ffff8a8be061bca0] bad_area_nosemaphore at ffffffff90675fa4
 #7 [ffff8a8be061bcb0] __do_page_fault at ffffffff90d88750
 #8 [ffff8a8be061bd20] do_page_fault at ffffffff90d88975
 #9 [ffff8a8be061bd50] page_fault at ffffffff90d84778
    [exception RIP: anon_vma_clone+117]
    RIP: ffffffff908008e5  RSP: ffff8a8be061be08  RFLAGS: 00010286
    RAX: ffff8a90d42e95f0  RBX: 0000000000000000  RCX: 0000000000ea39f5
    RDX: 0000000000000040  RSI: 0000000000000200  RDI: ffff8a0f7fc07b00
    RBP: ffff8a8be061be48   R8: 000000000001f0a0   R9: ffffffff908008d4
    R10: ffff8ad35135e0c0  R11: 0000000000000000  R12: ffff8a90d42e9d18
    R13: ffff8b0bea29d410  R14: ffff8a90d42e9cb0  R15: ffff8a90d42e95f0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#10 [ffff8a8be061be50] __split_vma at ffffffff907f962e
#11 [ffff8a8be061be90] do_munmap at ffffffff907f992a
#12 [ffff8a8be061bee0] vm_munmap at ffffffff907f9cb5
#13 [ffff8a8be061bf30] sys_munmap at ffffffff907faf52
#14 [ffff8a8be061bf50] system_call_fastpath at ffffffff90d8dede
    RIP: 00007f1ef3f82dd7  RSP: 00007f1e53ffebc0  RFLAGS: 00000246
    RAX: 000000000000000b  RBX: 0000000000040000  RCX: 00007f1ef3f6d727
    RDX: 0000000000000003  RSI: 0000000000040000  RDI: 00007f1d2af40000
    RBP: 0000000000922a40   R8: ffffffffffffffff   R9: 0000000000000000
    R10: 0000000000000022  R11: 0000000000000246  R12: 00007f1e53ffea58
    R13: 00007f1d2af00000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 000000000000000b  CS: 0033  SS: 002b
crash>
Andy
  • 9
  • 1

0 Answers0