0

I've been debugging an issue with my Synology NAS, where the kernel panics (or what I think what is happening). When that happens, the NAS does not respond to ping and the only way to recover it is to force power off by holding the power button for 10 seconds.

It happens randomly (few times a month to few times a day). I worked with the customer service and have already done the following:

  • Replaced the NAS itself, BTW it's a Synology DS1815+
  • Replaced the RAM module (the extended RAM I have)

The kernel panic seems to still occur and I start suspecting one of the drives to be the culprit. Does anyone recognize the panic?

machine info:

root@Apollo:~# uname -a
Linux Apollo 3.10.77 #8451 SMP Wed Jan 4 00:31:32 CST 2017 x86_64 GNU/Linux synology_avoton_1815+
root@Apollo:~# ls -l /dev/sd*
brw------- 1 root root 8,  0 Jan 13 08:38 /dev/sda
brw------- 1 root root 8,  1 Jan 13 08:38 /dev/sda1
brw------- 1 root root 8,  2 Jan 13 08:38 /dev/sda2
brw------- 1 root root 8,  5 Jan 13 08:38 /dev/sda5
brw------- 1 root root 8, 16 Jan 13 08:38 /dev/sdb
brw------- 1 root root 8, 17 Jan 13 08:38 /dev/sdb1
brw------- 1 root root 8, 18 Jan 13 08:38 /dev/sdb2
brw------- 1 root root 8, 21 Jan 13 08:38 /dev/sdb5
brw------- 1 root root 8, 32 Jan 13 08:38 /dev/sdc
brw------- 1 root root 8, 33 Jan 13 08:38 /dev/sdc1
brw------- 1 root root 8, 34 Jan 13 08:38 /dev/sdc2
brw------- 1 root root 8, 37 Jan 13 08:38 /dev/sdc5
brw------- 1 root root 8, 48 Jan 13 08:38 /dev/sdd
brw------- 1 root root 8, 49 Jan 13 08:38 /dev/sdd1
brw------- 1 root root 8, 50 Jan 13 08:38 /dev/sdd2
brw------- 1 root root 8, 53 Jan 13 08:38 /dev/sdd5
brw------- 1 root root 8, 64 Jan 13 08:38 /dev/sde
brw------- 1 root root 8, 65 Jan 13 08:38 /dev/sde1
brw------- 1 root root 8, 66 Jan 13 08:38 /dev/sde2
brw------- 1 root root 8, 69 Jan 13 08:38 /dev/sde5

The panic:

2017-01-12T15:18:39-08:00 Apollo kernel: [73339.744805] ata3.00: read unc at 11156987776
2017-01-12T15:18:39-08:00 Apollo kernel: [73339.749591] lba 11156987776 start 9453280 end 11720838239
2017-01-12T15:18:39-08:00 Apollo kernel: [73339.755633] sde5 auto_remap 0
2017-01-12T15:18:39-08:00 Apollo kernel: [73339.758952] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
2017-01-12T15:18:39-08:00 Apollo kernel: [73339.766156] ata3.00: irq_stat 0x40000001
2017-01-12T15:18:39-08:00 Apollo kernel: [73339.770544] ata3.00: failed command: READ DMA EXT
2017-01-12T15:18:39-08:00 Apollo kernel: [73339.775809] ata3.00: cmd 25/00:00:00:1c:02/00:04:99:02:00/e0 tag 20 dma 524288 in
2017-01-12T15:18:39-08:00 Apollo kernel: [73339.775809]          res 51/40:00:80:1f:02/00:04:99:02:00/e0 Emask 0x9 (media error)
2017-01-12T15:18:39-08:00 Apollo kernel: [73339.792832] ata3.00: status: { DRDY ERR }
2017-01-12T15:18:39-08:00 Apollo kernel: [73339.797315] ata3.00: error: { UNC }
2017-01-12T15:18:39-08:00 Apollo kernel: [73339.832084] Result: hostbyte=0x00 driverbyte=0x08
2017-01-12T15:18:39-08:00 Apollo kernel: [73339.840854] Sense Key : 0x3 [current] [descriptor]
2017-01-12T15:18:39-08:00 Apollo kernel: [73339.846232] Descriptor sense data with sense descriptors (in hex):
2017-01-12T15:18:39-08:00 Apollo kernel: [73339.867505] ASC=0x11 ASCQ=0x4
2017-01-12T15:18:39-08:00 Apollo kernel: [73339.874733] cdb[0]=0x88: 88 00 00 00 00 02 99 02 1c 00 00 00 04 00 00 00
2017-01-12T15:18:39-08:00 Apollo kernel: [73339.882314] end_request: I/O error, dev sde, sector 11156986880
2017-01-12T15:18:39-08:00 Apollo kernel: [73345.792522] ------------[ cut here ]------------
2017-01-12T15:18:39-08:00 Apollo kernel: [73345.797710] WARNING: at /source/lio-4.x/target_core_segment_lock.c:124 seglock_unlock+0x5d/0xf0 [target_core_mod]()
2017-01-12T15:18:39-08:00 Apollo kernel: [73345.809379] Modules linked in: bridge snd_usb_hiface stp aufs snd_pcm_oss macvlan veth xt_conntrack xt_addrtype snd_mixer_oss nf_conntrack_ipv6 snd_usb_audio snd_pcm nf_defrag_ipv6 snd_timer ip6table_filter ip6_tables snd_hwdep ipt_MASQUERADE snd_usbmidi_lib xt_REDIRECT snd_rawmidi xt_nat snd_seq_device iptable_nat snd nf_nat_ipv4 nf_nat snd_page_alloc xt_recent soundcore xt_iprange xt_limit xt_state xt_tcpudp xt_multiport xt_LOG nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_filter ip_tables x_tables iscsi_target_mod(O) target_core_ep(O) target_core_file(O) target_core_iblock(O) target_core_mod(O) syno_extent_pool(PO) rodsp_ep(O) cifs udf isofs loop nfsd exportfs rpcsec_gss_krb5 hid_generic usbhid hid usblp usb_storage 8021q bonding avoton_synobios(PO) leds_lp3943 btrfs synoacl_vfs(PO) zlib_deflate hfsplus md4 hmac libcrc32c compat(O) igb(O) i2c_algo_bit e1000e(O) fuse vfat fat crc32c_intel aesni_intel glue_helper lrw gf128mul ablk_helper arc4 cryptd ecryptfs sha512_generic sha256_generic sha1_generic ecb aes_x86_64 authenc des_generic ansi_cprng cts md5 cbc cpufreq_powersave cpufreq_performance acpi_cpufreq mperf processor thermal_sys cpufreq_stats freq_table dm_snapshot crc_itu_t crc_ccitt quota_v2 quota_tree psnap p8022 llc sit tunnel4 ip_tunnel ipv6 zram(C) sg etxhci_hcd xhci_hcd uhci_hcd ehci_pci ehci_hcd usbcore usb_common [last unloaded: avoton_synobios]
2017-01-12T15:18:39-08:00 Apollo kernel: [73345.945727] CPU: 2 PID: 9316 Comm: kworker/u17:14 Tainted: P         C O 3.10.77 #8451
2017-01-12T15:18:41-08:00 Apollo kernel: [73345.954582] Hardware name: Insyde MohonPeak/Type2 - Board Product Name1, BIOS M.110 2014/12/23
2017-01-12T15:18:41-08:00 Apollo kernel: [73345.964218] Workqueue: tcm_dio_wq transport_cmd_defer_exec_work_func [target_core_mod]
2017-01-12T15:18:41-08:00 Apollo kernel: [73345.973075]  ffffffff8149d52e ffffffff81033bc8 0000000000000000 ffff8801d47d08c0
2017-01-12T15:18:41-08:00 Apollo kernel: [73345.981359]  ffff8801d47d08c8 0000000000000282 0000000004b39480 ffffffffa06bbfdd
2017-01-12T15:18:41-08:00 Apollo kernel: [73345.989642]  ffff8801d2760c00 ffff8801d27609f0 0000000000000000 ffff8801d5404000
2017-01-12T15:18:41-08:00 Apollo kernel: [73345.997928] Call Trace:
2017-01-12T15:18:41-08:00 Apollo kernel: [73346.000662]  [<ffffffff8149d52e>] ? dump_stack+0xc/0x15
2017-01-12T15:18:41-08:00 Apollo kernel: [73346.006502]  [<ffffffff81033bc8>] ? warn_slowpath_common+0x58/0x70
2017-01-12T15:18:41-08:00 Apollo kernel: [73346.013418]  [<ffffffffa06bbfdd>] ? seglock_unlock+0x5d/0xf0 [target_core_mod]
2017-01-12T15:18:41-08:00 Apollo kernel: [73346.021500]  [<ffffffffa06a9f09>] ? transport_cmd_defer_exec_work_func+0x89/0x150 [target_core_mod]
2017-01-12T15:18:41-08:00 Apollo kernel: [73346.031617]  [<ffffffff8104d654>] ? process_one_work+0x144/0x3d0
2017-01-12T15:18:41-08:00 Apollo kernel: [73346.038331]  [<ffffffff8104e1ed>] ? worker_thread+0x10d/0x3a0
2017-01-12T15:18:41-08:00 Apollo kernel: [73346.044754]  [<ffffffff8104e0e0>] ? manage_workers.isra.26+0x280/0x280
2017-01-12T15:18:41-08:00 Apollo kernel: [73346.052052]  [<ffffffff81053882>] ? kthread+0xb2/0xc0
2017-01-12T15:18:41-08:00 Apollo kernel: [73346.057698]  [<ffffffff810537d0>] ? kthread_create_on_node+0x110/0x110
2017-01-12T15:18:41-08:00 Apollo kernel: [73346.064995]  [<ffffffff814a3408>] ? ret_from_fork+0x58/0x90
2017-01-12T15:18:41-08:00 Apollo kernel: [73346.071223]  [<ffffffff810537d0>] ? kthread_create_on_node+0x110/0x110
2017-01-12T15:18:41-08:00 Apollo kernel: [73346.078519] ---[ end trace 8d9001a14e0d0c9f ]---
kalbasit
  • 121
  • 5
  • Have you tried replacing the failing disk? – Michael Hampton Jan 13 '17 at 18:33
  • No, I haven't. But at the same time there is no indication (besides the panic) that it is failing. Looking at S.M.A.R.T and extended one, looks healthy. Do you think I should replace sde? – kalbasit Jan 13 '17 at 18:39
  • This log indicates the drive has experienced an uncorrectable read error. Of course replacing it is the first step! – Michael Hampton Jan 13 '17 at 18:40
  • Ok just ordered a replacement, I hope this fixes the issue. – kalbasit Jan 14 '17 at 04:31
  • I got the replacement drive, did the replacement which put the MD array in a degraded state. I started a volume repair (from the Synology GUI), and after a few minutes, the NAS crashed again. I'm starting to doubt the drive is the problem (I am sure that I have replaced the correct one matched by the serial number). I put the original disk back, and it's rebuilding the array again. I have a feeling that this is related to iSCSI and not to anything else; I use iSCSI as the storage for my Xen. I'm also in contact with support trying to nail this issue down. – kalbasit Jan 18 '17 at 00:19
  • Synology released an update today which fixes an issue with iSCSI (in response to my issue). I hope that it is the iSCSI after all and it won't happen again. So far, the drive (SDE seen with an error above) is behaving correctly, and currently undergoing an extended S.M.A.R.T test. I'll try to get more info from Syno's devs on what exactly was going on. – kalbasit Jan 19 '17 at 19:53

0 Answers0