2

I'm running a FreeBSD 10.1-RELEASE-p19 on a VPS (VMware).

My ISP is experience a rapid data growth, and these messages spontaneous started to show up in our logs a week ago.

Sep 25 09:00:50 srv03 kernel: (da0:mpt0:0:0:0): SCSI status: Busy
Sep 25 09:00:50 srv03 kernel: (da0:mpt0:0:0:0): Retrying command
Sep 25 09:00:50 srv03 kernel: (da0:mpt0:0:0:0): WRITE(10). CDB: 2a 00 03 f9 6c 22 00 00 40 00
Sep 25 09:00:50 srv03 kernel: (da0:mpt0:0:0:0): CAM status: SCSI Status Error

Sometimes the server is totally losing contact with the storage, and then panic and restarts. This often occur every even hour, presumably by a routine job (migration/backup).

Until my ISP have added more storage system, that will lower the load on the storage, I really want to try do something.

I have found this, but are unsure how to patch/use the information: https://svnweb.freebsd.org/base?view=revision&revision=278111

I also found this (vfs.unmapped_buf_allowed=0), but I'm unsure if this could be related? https://www.freebsd.org/releases/10.1R/errata.html#open-issues

camcontrol tags da0 -v

(pass1:mpt0:0:0:0): dev_openings  127
(pass1:mpt0:0:0:0): dev_active    0
(pass1:mpt0:0:0:0): devq_openings 127
(pass1:mpt0:0:0:0): devq_queued   0
(pass1:mpt0:0:0:0): held          -1
(pass1:mpt0:0:0:0): mintags       2
(pass1:mpt0:0:0:0): maxtags       255

gstat info when errors occur: enter image description here

Any thoughts, hints, ideas would be really really really appreciated.

Thanks!

Aknot
  • 153
  • 1
  • 9

1 Answers1

1

If you are using VMWare, thus mpt(4) is purely virtual, I would suggest changing it to something more simple, like ICH10.

Otherwise I suggest you play with camcontrol tags, either increasing or decreasing queue length.

If you'll chose to reprovision disks using another driver, notice that SAS -> SATA controller change may result in device naming change, probably /dev/daX will become /dev/adaX, so unless you are using zfs or mounting your disks via disk labels, you'll have to edit /etc/fstab.

As about your gstat output - there's clearly something wrong with it, probably to the nature of the virtual environment support in FreeBSD. 600% load is nonsense. I suggest you report this into the FreeBSD Bugzilla.

P.S. The advice to change disk provisioning controller type still stands. P.P.S. Or. Or I would try to lover the queue length of the mpt(4) to 128 or even 64.

drookie
  • 8,051
  • 1
  • 17
  • 27
  • Thanks for your answer @drookie, please let me get back with a `gstat` snapshot to start with. The man page says that novice users (like me), should stay away from `camcontrol` - sounds a bit scary. – Aknot Sep 25 '15 at 09:44
  • Your'e right, but this time I advise you so. I tried this by myself on mpt(4) with LSI 1064 controller family, no devastating or even harmful consequencies encountered (though I didn't notice any improvements too, in my case). But you're right, it's your equipment. I updated my answer too, please notice changes. – drookie Sep 25 '15 at 10:32
  • Thanks again @drookie - I have found this, that seems to address and fix this exact problem: https://svnweb.freebsd.org/base?view=revision&revision=278111 A upgrade to 10.2-Release resolved the issues. Thanks for your input! – Aknot Oct 01 '15 at 06:09