3

I have a raid system on debian:

Disk /dev/sda: 320.1 GB,...
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        2432    19535008+  fd  Linux raid autodetect
/dev/sda2            2433        2918     3903795   fd  Linux raid autodetect
/dev/sda3            2919       38913   289129837+  fd  Linux raid autodetect

Disk /dev/sdb: 320.1 GB, ...
   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1        2432    19535008+  fd  Linux raid autodetect
/dev/sdb2            2433        2918     3903795   fd  Linux raid autodetect
/dev/sdb3            2919       38913   289129837+  fd  Linux raid autodetect

# df -h 
/dev/md0               19G   12G  6,0G  66% /      type ext3 (rw)
/dev/md2              272G  245G   25G  91% /var   type ext3 (rw)

I would like to check if everything is running fine and configure it, so I will get an email if any error occurs.


the only line in my /etc/smartd.conf is:

DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner

will that scan those 2 raid devices?

And in my /etc/cron.d/mdadm there is this line:
57 0 * * 0 root if [ -x /usr/share/mdadm/checkarray ] && [ $(date +\%d) -le 7 ]; then /usr/share/mdadm/checkarray --cron --all --idle --quiet; fi

in /usr/share/mdadm/checkarray it sais: it initiates a check run of an MD array's redundancy information

rubo77
  • 2,282
  • 3
  • 32
  • 63

1 Answers1

3

If you want to monitor reliability of hard disks install smartmontools package which provides utilities to check hard disks for disk degradation and failure, using the Self-Monitoring, Analysis and Reporting Technology System (SMART) built into most modern ATA and SCSI hard disks.

The package contains smartctl tool which is useful for checking hard disks from command line and smartd daemon that checks hard disks at a specified interval and logs warnings/errors to the syslog and can also send warnings and errors to a specified email address.

To enable the daemon, you have to uncomment the line start_smart in the file /etc/default/smartmontools. Then, you have to define in the file /etc/smartd.conf what hard disks do you want to monitor and start the service smartmontools (check man smartd and man smartd.conf for detailed instructions, further, there are many examples in this file):

/dev/sda  -m admin@example.com -M exec /usr/share/smartmontools/smartd-runner
/dev/sdb  -m admin@example.com -M exec /usr/share/smartmontools/smartd-runner

You can monitor your md devices with mdadm tool. If you want to receive emails with alerts define a mail recipient in the file /etc/mdadm.conf (details in man mdadm.conf and man mdadm):

MAILADDR admin@example.com

Then, schedule via cron this command (the schedule period is up to you):

mdadm --monitor --scan -1
dsmsk80
  • 5,757
  • 17
  • 22
  • I added the information on my machine to the question. Do I have to change anything there? How can I check if the services are running and the check is done? – rubo77 Sep 02 '13 at 12:41
  • The word DEVICESCAN will cause any remaining lines in this configuration file to be ignored: it tells smartd to scan for all ATA and SCSI devices. Just change -m email option. – dsmsk80 Sep 04 '13 at 07:28
  • OK.Now only missing a solution how to check,if the services are running correctly.Can I somehow simulate a SMART-failure and a diskcrash in one of the raid? – rubo77 Sep 04 '13 at 07:39
  • The checkarray will run parity checks across all your redundant arrays what is not exactly what you requested. So schedule another batch with mdadm --monitor --scan -1 command. – dsmsk80 Sep 04 '13 at 07:40
  • Disk failure simulation: mdadm --manage --set-faulty /dev/mdX /dev/sdY, I don't know the structure of your MD devices. SMART error simulation is not easily doable (you will have to create some smartctl wrapper to produce SMART report with some errors). – dsmsk80 Sep 04 '13 at 07:44
  • Do I have to change that in my /etc/cron.d/mdadm? Or in `/etc/smartd.conf`? Do I leave the line with DEVICESCAN? Whats the difference to checkarray? – rubo77 Sep 04 '13 at 08:30