2

I'm having a problem with Adaptec 5805 raid card

http://www.adaptec.com/en-us/support/raid/sas_raid/sas-5805/

(with two SAS discs in raid) and Gigabyte motherboard GA-H67A-D3H-B3

http://www.gigabyte.com/products/product-page.aspx?pid=3866#sp

running CENTOS 6 as webserver.

Short story : when I boot the server, the raid card runs on full speed, doing over 250Mb/s transfer rate. Within no more than 60 minutes, I receive an IRQ error, IRQ 16 is stopped and since then, the card does no more than 2,5Mb/s transfer rate (but working). I need to fix it, so the card runs on full speed all the time.

Long story :

1] the motherboard doesn't have PCIe x8 slot to fit the raid card. I tried the x16 slot, but when in this slot, the card is not detected at all, system boots without it. So I used x4 slot, where the card (surprisingly for me), works great. Except the IRQ ...

2] there are two SATA disks connected to motherboard, each as primary on its channel

SAMSUNG HD502HJ SAMSUNG HD103UJ

then, there is additional network card in first of the normal PCI slots (in the picture on the above link, its the right-most white PCI slot next to "DUAL BOOT" description on the mobo.

And the raid card is in the PCIeX4 slot (next to those three white PCI slots)

Nothing else is used, I do not use any USB devices or anything else, just two SATA discs, two network connectors (mobo and card) and raid card with two SAS discs connected

3] system is like i said Centos 6

uname -a

Linux 2.6.32-71.29.1.el6.x86_64 #1 SMP Mon Jun 27 19:49:27 BST 2011 x86_64 x86_64 x86_64 GNU/Linux

CPU is

Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz

lspci -v

00:00.0 Host bridge: Intel Corporation Sandy Bridge DRAM Controller (rev 09)
    Flags: bus master, fast devsel, latency 0
    Capabilities: [e0] Vendor Specific Information <?>

00:02.0 VGA compatible controller: Intel Corporation Sandy Bridge Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
    Subsystem: Giga-byte Technology Device d000
    Flags: bus master, fast devsel, latency 0, IRQ 10
    Memory at fb400000 (64-bit, non-prefetchable) [size=4M]
    Memory at e0000000 (64-bit, prefetchable) [size=256M]
    I/O ports at ff00 [size=64]
    Expansion ROM at <unassigned> [disabled]
    Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
    Capabilities: [d0] Power Management version 2
    Capabilities: [a4] PCI Advanced Features

00:16.0 Communication controller: Intel Corporation Cougar Point HECI Controller #1 (rev 04)
    Subsystem: Giga-byte Technology Device 1c3a
    Flags: bus master, fast devsel, latency 0, IRQ 10
    Memory at fbfff000 (64-bit, non-prefetchable) [size=16]
    Capabilities: [50] Power Management version 3
    Capabilities: [8c] MSI: Enable- Count=1/1 Maskable- 64bit+

00:1a.0 USB Controller: Intel Corporation Cougar Point USB Enhanced Host Controller #2 (rev 05) (prog-if 20 [EHCI])
    Subsystem: Giga-byte Technology Device 5006
    Flags: bus master, medium devsel, latency 0, IRQ 18
    Memory at fbffe000 (32-bit, non-prefetchable) [size=1K]
    Capabilities: [50] Power Management version 2
    Capabilities: [58] Debug port: BAR=1 offset=00a0
    Capabilities: [98] PCI Advanced Features
    Kernel driver in use: ehci_hcd

00:1c.0 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0
    Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
    Memory behind bridge: fb800000-fbbfffff
    Prefetchable memory behind bridge: 00000000dc000000-00000000dc0fffff
    Capabilities: [40] Express Root Port (Slot+), MSI 00
    Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
    Capabilities: [90] Subsystem: Giga-byte Technology Device 5001
    Capabilities: [a0] Power Management version 2
    Kernel driver in use: pcieport
    Kernel modules: shpchp

00:1c.5 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 6 (rev b5) (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0
    Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
    I/O behind bridge: 0000d000-0000dfff
    Prefetchable memory behind bridge: 00000000fbd00000-00000000fbdfffff
    Capabilities: [40] Express Root Port (Slot+), MSI 00
    Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
    Capabilities: [90] Subsystem: Giga-byte Technology Device 5001
    Capabilities: [a0] Power Management version 2
    Kernel driver in use: pcieport
    Kernel modules: shpchp

00:1c.6 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5) (prog-if 01 [Subtractive decode])
    Flags: bus master, fast devsel, latency 0
    Bus: primary=00, secondary=03, subordinate=04, sec-latency=0
    I/O behind bridge: 0000e000-0000efff
    Memory behind bridge: fbc00000-fbcfffff
    Prefetchable memory behind bridge: 00000000dc100000-00000000dc1fffff
    Capabilities: [40] Express Root Port (Slot+), MSI 00
    Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
    Capabilities: [90] Subsystem: Giga-byte Technology Device 5001
    Capabilities: [a0] Power Management version 2

00:1c.7 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 8 (rev b5) (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0
    Bus: primary=00, secondary=05, subordinate=05, sec-latency=0
    Memory behind bridge: fbe00000-fbefffff
    Capabilities: [40] Express Root Port (Slot+), MSI 00
    Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
    Capabilities: [90] Subsystem: Giga-byte Technology Device 5001
    Capabilities: [a0] Power Management version 2
    Kernel driver in use: pcieport
    Kernel modules: shpchp

00:1d.0 USB Controller: Intel Corporation Cougar Point USB Enhanced Host Controller #1 (rev 05) (prog-if 20 [EHCI])
    Subsystem: Giga-byte Technology Device 5006
    Flags: bus master, medium devsel, latency 0, IRQ 23
    Memory at fbffd000 (32-bit, non-prefetchable) [size=1K]
    Capabilities: [50] Power Management version 2
    Capabilities: [58] Debug port: BAR=1 offset=00a0
    Capabilities: [98] PCI Advanced Features
    Kernel driver in use: ehci_hcd

00:1f.0 ISA bridge: Intel Corporation Cougar Point LPC Controller (rev 05)
    Subsystem: Giga-byte Technology Device 5001
    Flags: bus master, medium devsel, latency 0
    Capabilities: [e0] Vendor Specific Information <?>
    Kernel modules: iTCO_wdt

00:1f.2 IDE interface: Intel Corporation Cougar Point 4 port SATA IDE Controller (rev 05) (prog-if 8f [Master SecP SecO PriP PriO])
    Subsystem: Giga-byte Technology Device b002
    Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 19
    I/O ports at fe00 [size=8]
    I/O ports at fd00 [size=4]
    I/O ports at fc00 [size=8]
    I/O ports at fb00 [size=4]
    I/O ports at fa00 [size=16]
    I/O ports at f900 [size=16]
    Capabilities: [70] Power Management version 3
    Capabilities: [b0] PCI Advanced Features
    Kernel driver in use: ata_piix
    Kernel modules: ata_generic, pata_acpi, ata_piix

00:1f.3 SMBus: Intel Corporation Cougar Point SMBus Controller (rev 05)
    Subsystem: Giga-byte Technology Device 5001
    Flags: medium devsel, IRQ 18
    Memory at fbffc000 (64-bit, non-prefetchable) [size=256]
    I/O ports at 0500 [size=32]
    Kernel driver in use: i801_smbus
    Kernel modules: i2c-i801

00:1f.5 IDE interface: Intel Corporation Cougar Point 2 port SATA IDE Controller (rev 05) (prog-if 85 [Master SecO PriO])
    Subsystem: Giga-byte Technology Device b002
    Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 19
    I/O ports at f700 [size=8]
    I/O ports at f600 [size=4]
    I/O ports at f500 [size=8]
    I/O ports at f400 [size=4]
    I/O ports at f300 [size=16]
    I/O ports at f200 [size=16]
    Capabilities: [70] Power Management version 3
    Capabilities: [b0] PCI Advanced Features
    Kernel driver in use: ata_piix
    Kernel modules: ata_generic, pata_acpi, ata_piix

01:00.0 RAID bus controller: Adaptec AAC-RAID (rev 09)
    Subsystem: Adaptec ASR5805
    Flags: bus master, fast devsel, latency 0, IRQ 16
    Memory at fb800000 (64-bit, non-prefetchable) [size=2M]
    [virtual] Expansion ROM at dc000000 [disabled] [size=512K]
    Capabilities: [98] Power Management version 2
    Capabilities: [a0] MSI: Enable- Count=1/2 Maskable- 64bit+
    Capabilities: [d0] Express Endpoint, MSI 00
    Capabilities: [90] Vital Product Data
    Capabilities: [100] Advanced Error Reporting
    Kernel driver in use: aacraid
    Kernel modules: aacraid

02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)
    Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard
    Flags: bus master, fast devsel, latency 0, IRQ 32
    I/O ports at de00 [size=256]
    Memory at fbdff000 (64-bit, prefetchable) [size=4K]
    Memory at fbdf8000 (64-bit, prefetchable) [size=16K]
    Capabilities: [40] Power Management version 3
    Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
    Capabilities: [70] Express Endpoint, MSI 01
    Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
    Capabilities: [d0] Vital Product Data
    Capabilities: [100] Advanced Error Reporting
    Capabilities: [140] Virtual Channel <?>
    Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00
    Kernel driver in use: r8169
    Kernel modules: r8169

03:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892 (rev 30) (prog-if 01 [Subtractive decode])
    Flags: bus master, fast devsel, latency 0
    Bus: primary=03, secondary=04, subordinate=04, sec-latency=32
    I/O behind bridge: 0000e000-0000efff
    Memory behind bridge: fbc00000-fbcfffff
    Prefetchable memory behind bridge: 00000000dc100000-00000000dc1fffff
    Capabilities: [90] Power Management version 2
    Capabilities: [a0] Subsystem: Giga-byte Technology Device 5000

04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
    Subsystem: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet
    Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 18
    I/O ports at ee00 [size=256]
    Memory at fbcff000 (32-bit, non-prefetchable) [size=256]
    [virtual] Expansion ROM at dc100000 [disabled] [size=64K]
    Capabilities: [dc] Power Management version 2
    Kernel driver in use: r8169
    Kernel modules: r8169

05:00.0 USB Controller: Device 1b6f:7023 (rev 01) (prog-if 30)
    Subsystem: Device 1b6f:7023
    Flags: bus master, fast devsel, latency 0, IRQ 11
    Memory at fbef8000 (64-bit, non-prefetchable) [size=32K]
    Capabilities: [50] Power Management version 3
    Capabilities: [70] MSI: Enable- Count=1/4 Maskable+ 64bit+
    Capabilities: [a0] Express Endpoint, MSI 00
    Capabilities: [100] Advanced Error Reporting
    Capabilities: [190] Device Serial Number 01-01-01-01-01-01-01-01

lspci -vv

01:00.0 RAID bus controller: Adaptec AAC-RAID (rev 09)
Subsystem: Adaptec ASR5805
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 4 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at fb800000 (64-bit, non-prefetchable) [size=2M]
[virtual] Expansion ROM at dc000000 [disabled] [size=512K]
Capabilities: [98] Power Management version 2
    Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
    Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [a0] MSI: Enable- Count=1/2 Maskable- 64bit+
    Address: 0000000000000000  Data: 0000
Capabilities: [d0] Express (v1) Endpoint, MSI 00
    DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 <1us
        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
    DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
        MaxPayload 128 bytes, MaxReadReq 512 bytes
    DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
    LnkCap: Port #0, Speed 2.5GT/s, Width x8, ASPM L0s, Latency L0 <128ns, L1 unlimited
        ClockPM- Surprise- LLActRep- BwNot-
    LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
    LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [90] Vital Product Data
    Unknown small resource type 00, will not decode more.
Capabilities: [100] Advanced Error Reporting
    UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
    UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
    UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
    CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
    CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
    AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Kernel driver in use: aacraid
Kernel modules: aacraid

cat /proc/interrupts

       CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
0:        128          0          0          0          0          0          0          0   IO-APIC-edge      timer
1:        105          0        606       4366          0          0          0          0   IO-APIC-edge      i8042
8:          1          0          0          0          0          0          0          0   IO-APIC-edge      rtc0
9:          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   acpi
16:       1381          0     197881        730          0          0          0          9   IO-APIC-fasteoi   aacraid
18:       1695          0          0          0      13372   60347990          0          0   IO-APIC-fasteoi   ehci_hcd:usb1, eth1
19:       4637          0      14949    6352494          0          0          0     106473   IO-APIC-fasteoi   ata_piix, ata_piix
23:         33          0         27         12          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb2
24:        291          0          0          0          0          0          0          0  HPET_MSI-edge      hpet2
25:          0          0          0          0          0          0          0          0  HPET_MSI-edge      hpet3
26:          0          0          0          0          0          0          0          0  HPET_MSI-edge      hpet4
27:          0          0          0          0          0          0          0          0  HPET_MSI-edge      hpet5
28:          0          0          0          0          0          0          0          0  HPET_MSI-edge      hpet6
32:       1275          0          0          0          0       1905   21317086          0   PCI-MSI-edge      eth0
NMI:       1873      10150       1974       1672        702       3046       1825        780   Non-maskable interrupts
LOC:   17501877   13611350   13868117    3612581    1520650    1850972    8633075    1486682   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0          0          0          0          0   Performance monitoring interrupts
PND:          0          0          0          0          0          0          0          0   Performance pending work
RES:       5238      34250      12858       4299       1555       4833       5663       2485   Rescheduling interrupts
CAL:        334        302        429        414        421        464        465        468   Function call interrupts
TLB:       7863     154723      12147      11152      14099      33766      42580      11065   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:        293        293        293        293        293        293        293        293   Machine check polls
ERR:          7
MIS:          0

the module used is kernel module kmod-aacraid from elrepo for Centos 6

Installed Packages
Name       : kmod-aacraid
Arch       : x86_64
Version    : 1.1.7
Release    : 1.el6.elrepo
Size       : 340 k
Repo       : installed
From repo  : elrepo
Summary    : aacraid kernel module(s)
URL        : http://www.adaptec.com/
License    : GPLv2
Description: This package provides the aacraid kernel module(s) built
       : for the Linux kernel using the x86_64 family of processors.

and the error from the log

Dec 15 14:02:33 kernel: irq 16: nobody cared (try booting with the "irqpoll" option)
Dec 15 14:02:33 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-71.29.1.el6.x86_64 #1
Dec 15 14:02:33 kernel: Call Trace:
Dec 15 14:02:33 kernel: <IRQ>  [<ffffffff810da96b>] __report_bad_irq+0x2b/0xa0
Dec 15 14:02:33 kernel: [<ffffffff810dab6c>] note_interrupt+0x18c/0x1d0
Dec 15 14:02:33 kernel: [<ffffffff810db255>] handle_fasteoi_irq+0xc5/0xf0
Dec 15 14:02:33 kernel: [<ffffffff81015fb9>] handle_irq+0x49/0xa0
Dec 15 14:02:33 kernel: [<ffffffff814d093c>] do_IRQ+0x6c/0xf0
Dec 15 14:02:33 kernel: [<ffffffff81013ad3>] ret_from_intr+0x0/0x11
Dec 15 14:02:33 kernel: <EOI>  [<ffffffff812da962>] ? acpi_idle_enter_c1+0xa3/0xc1
Dec 15 14:02:33 kernel: [<ffffffff812da941>] ? acpi_idle_enter_c1+0x82/0xc1
Dec 15 14:02:33 kernel: [<ffffffff813df687>] cpuidle_idle_call+0xa7/0x140
Dec 15 14:02:33 kernel: [<ffffffff81011e96>] cpu_idle+0xb6/0x110
Dec 15 14:02:33 kernel: [<ffffffff814c27d8>] start_secondary+0x1fc/0x23f
Dec 15 14:02:33 kernel: handlers:
Dec 15 14:02:33 kernel: [<ffffffffa002a590>] (aac_rx_intr_message+0x0/0xc0 [aacraid])
Dec 15 14:02:33 kernel: Disabling IRQ #16

I do not see any IRQ 16 conflict, the suggested irqpoll option doesn't change a thing. I do not need USB, so i can disable it, but the system is production one, so I want to know, where the problem is, before I start to mess with BIOS or any other thing (and I also need to reduce the downtime as much as possible).

Can anyone help me with diagnosing the problem here?

Scott Pack
  • 14,717
  • 10
  • 51
  • 83
Radek
  • 153
  • 2
  • 7
  • 2
    Problem also occurs on our system (Gigabyte, Adaptec, can't remember exact models) with Debian stock kernel 2.6.32-5-amd64. – thiton Apr 20 '12 at 10:27
  • 1
    I have upgraded the kernel to 3.2.0-0.bpo.2-amd64, and the system has been stable under very high I/O load for 8 hours now. Problem seems to be fixed. – thiton Apr 20 '12 at 20:45
  • 1
    Had problems with these cards too, from what I read after throwing them out in frustration :) they REALLY do not like sharing interrupts. Had one system where they simply REFUSED to work stable - usually I am not the person to say "some hardware just wont play nice together", here I do. – rackandboneman May 21 '12 at 19:17

0 Answers0