19

In a Debian 6.0.6 system there are 74 pieces of 2TB Toshiba DT01ABA200 drives. These drives are identified as Hitachi HDS5C3020BLE630 drives running firmware revision MZ4OAAB0. 64 Drives attached via HP SAS expander cards to an LSI 2008 SAS controller, another 5 drives are connected directly to the mainboard, 4 drives are connected to a Sil based PCI controller and last 1 drive is only powered and has no data cable connected. The controller LSI and Sil card's their onboard BIOS are both disabled and the mpt2sas and sata_sil modules are removed from the Linux debian 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012 x86_64 GNU/Linux kernel. The mpt2sas module is loaded after boot using a modprobe command in /etc/rc.local. These 74 drives are not partitioned, neither formatted and also not mounted.

The system consumes:

  • with 0 drives: 70.6 - 70.9 Watt (also 15 minutes after boot);
  • with 74 drives: 330 - 360 Watt, just after boot (is equivalent to 3.5 - 3.9W per drive in idle state);
  • with 74 drives: 420 - 466 Watt, each time in the 15th minute of uptime (is equivalent to 4.7 - 5.3W per drive in idle state).

The drive specification lists 4.7W as read/write, and 3.3W as idle power consumption.

The increased power consumption is most likely on the 5V line, because after roughly 1 minute an "over current protection" (OCP) of the power supply (PSU) shuts down the power. The used PSU is a single rail model with an OCP of > 122A on the 12V line and > 55A on the 5V line.

Regression:

  • It doesn't matter whether the drive its APM value is set to disabled or 1 (maximum power saving).
  • The operating system records no read/write activity in /proc/diskstats. The values there are identical (28 read, 0 write operations) as immediately after the modprobe operation.
  • Can't test what happens when booting into the mainboard it's BIOS - to exclude any OS intervention - because the Super Micro X8SI6-F mainboard running firmware 06/27/12 has a bug that incorrectly reads a +74.0 C CPU sensor temperature as "High" in BIOS mode, and shuts down the power after 1 minute.

What might be causing the drive read/write activity on all drives in the 15th minute after boot and how to prevent it from happening?

Martin Schröder
  • 315
  • 1
  • 5
  • 24
Pro Backup
  • 914
  • 4
  • 15
  • 33
  • Just curious... What type of system is this? Backup system? All software RAID? – ewwhite Dec 16 '12 at 14:21
  • Currently just testing, intended for backup storage without any RAID. The redundancy will be supplied by optional secondary and tertiary servers. – Pro Backup Dec 16 '12 at 14:27
  • @ewwhite reminds me of Backblaze pods. Someone had to mention that name. – Dmitri Chubarov Dec 16 '12 at 15:57
  • @Dmitri Chubarov It's like a Backblaze storage pod, but without SATA port multipliers, 5U height, no RAID, 74 instead of 45 drives, a single PSU, only 2.0 Watt of power consumption for cooling, and when all drives are spinning idle having a temperature difference of 6 degrees degrees between the most cool and most warm drive. – Pro Backup Dec 16 '12 at 18:52

1 Answers1

20

Sounds very much like the drives are doing SMART scrubbing (automatic offline testing).

smartctl -a /dev/hdx

should confirm the configuration with:

Auto Offline Data Collection: Enabled.

Disable with:

smartctl --offlineauto=off /dev/hdx

It could be something else too...

techraf
  • 4,163
  • 8
  • 27
  • 44
Chris S
  • 77,337
  • 11
  • 120
  • 212
  • `smartctl --offlineauto=off` did the trick. At least for 32 minutes there are no more huge power consumption increases causing the PSU OCP to shutdown. As a bonus hdparm -SX is now setting drives from "active/idle" to "standby". However the drives attached to the sata_sil controller could not be controller. Temporarily plugging these drives to another controller is the workaround. The offline data collection setting survives reboots and power cycles. – Pro Backup Dec 16 '12 at 19:17