9

I am getting the infamous aacraid: Host adapter abort request error with my new Adaptec RAID Controller under high I/O. I have read several forums, even Adaptecs, that setting the /sys/block/sdX/device/timeout value to 45 will fix this. However I am running Ubuntu Server 12.04 which already has this value at 45 by default. I also tried the next suggestion which was to update my mobo's bios to the latest, which I did.

I am not sure if anyone else has run into this " aacraid: Host adapter abort request" error before even after taking these steps.

This is what I see in my syslog:

kernel: [ 5493.523282] aacraid: Host adapter abort request (4,0,0,0)
Jan  6 20:29:15 server kernel: [ 5493.523309] aacraid: Host adapter abort request (4,0,0,0)
Jan  6 20:29:15 server kernel: [ 5493.523375] aacraid: Host adapter reset request. SCSI hang ?

Heres my uname -a

Linux server 3.2.0-29-generic #46-Ubuntu SMP Fri Jul 27 17:03:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Thanks All,

Jim

Jim
  • 978
  • 7
  • 20
  • 32

2 Answers2

1

In case you didn't resolve this yet, I recently wrestled with the same issue which quickly escalated to array hanging every 5 minutes for couple minutes as the IO increased. Ubuntu by default uses CFQ scheduler which isn't optimal for hardware RAID. Switch the scheduler to noop with:

echo noop > /sys/block/<blockdevice>/queue/scheduler

Personally I'm stuck with old kernel but I've been told also upgrading to latest aacraid driver should fix the issue - can't verify that though. But even so, switch to noop. Since sysfs isn't permanent so you might want to set the scheduler in /etc/rc.local or use the elevator= boot parameter.

I'd pay attention to other kernel parameters as well as settings on Ubuntu are reasonable defaults for most common hardware but most of the time servers do need special attention regardless of distro you're on.

Kev
  • 11
  • 1
1

If your Adaptec RAID controller has its own firmware/BIOS, you may need to update that. We had issues during high I/O and got "aacraid: Host adapter abort request" as well and saw a firmware release newer than our current one which said "Fixed an issue where the firmware could hang during high I/O stress." http://download.adaptec.com/pdfs/readme/relnotes_arc_fw-b18937_asm-18837.pdf.

The above release notes list the following Adaptec models: 2045, 2405, 2405Q, 2805, 5085, 5405, 5405Z, 5445, 5445Z, 5805, 5805Q, 5805Z, 5805ZQ, 51245, 51645, 52445).

We also got log lines like:

sd 0:0:0:0: timing out command, waited 360s

and

Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT,SUGGEST_OK

In searching online to see other people having a similar issue, we found another line of cards which has had the following issues fixed by firmware which could be relevant:

The above two apply to Adaptec models 7805, 7805Q, 78165, 71605E, 71605, 71605Q, 71685, 72405, 8805, 8885, 8885Q, and 81605ZQ.

sa289
  • 1,308
  • 2
  • 17
  • 42
  • I think my controller would hang when I used the `arcconf` command line utility to query the status of the array as part of our regular Nagios monitoring scripts. As the firmware notes indicate, using `arcconf` can cause the controller to hang. – Stefan Lasiewski Sep 15 '14 at 20:33