1

I have two boxes with exactly same hardware configuration. Both having RAID0 (created using mdadm) over SATA disks. But I am getting different cached reads while testing using hdparm command.

$ hdparm -tT /dev/sda

/dev/sda:
 Timing cached reads:   924 MB in  2.00 seconds = 462.20 MB/sec
 Timing buffered disk reads: 290 MB in  3.04 seconds =  95.44 MB/sec

While on the other box,

$ hdparm -tT /dev/sda

/dev/sda:
 Timing cached reads:   18404 MB in  2.00 seconds = 9201.42 MB/sec
 Timing buffered disk reads: 322 MB in  3.00 seconds = 107.18 MB/sec

Can someone help me in solving this issue. Why I am getting low cached reads on one of the server. Is there any BIOS settings handling this ?

Edit 1:

I tried perf tool on both the machines.

Output on the box where issue is coming:

# perf stat dd if=/dev/zero of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB) copied, 6.22039 s, 82.3 MB/s

Performance counter stats for 'dd if=/dev/zero of=/dev/null count=1000000':

    5692.357502  task-clock-msecs         #      0.913 CPUs
             72  context-switches         #      0.000 M/sec
              7  CPU-migrations           #      0.000 M/sec
            220  page-faults              #      0.000 M/sec
      975469183  cycles                   #    171.365 M/sec
     1374701843  instructions             #      1.409 IPC
          65350  cache-references         #      0.011 M/sec
          17986  cache-misses             #      0.003 M/sec

Output on the other box:

$ perf stat dd if=/dev/zero of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB) copied, 0.280017 s, 1.8 GB/s

 Performance counter stats for 'dd if=/dev/zero of=/dev/null count=1000000':

     278.388839  task-clock-msecs         #      0.994 CPUs
              0  context-switches         #      0.000 M/sec
              0  CPU-migrations           #      0.000 M/sec
            220  page-faults              #      0.001 M/sec
      725024593  cycles                   #   2604.359 M/sec
     1371073131  instructions             #      1.891 IPC
          15921  cache-references         #      0.057 M/sec
           1847  cache-misses             #      0.007 M/sec

I didn't understand why there are so many context switches and value of task-clock-msecs is also high. Can someone please help me in debugging further.

Edit 2:

I am getting the following output for smartctl command:

# /usr/local/sbin/smartctl -i /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.32-5-amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST91000640NS
Serial Number:    9XG40W61
LU WWN Device Id: 5 000c50 050920a25
Add. Product Id:  DELL(tm)
Firmware Version: AA09
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sun Sep 29 00:03:33 2013 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled


# /usr/local/sbin/smartctl -i /dev/sdb
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.32-5-amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST91000640NS
Serial Number:    9XG41K1L
LU WWN Device Id: 5 000c50 05093c434
Add. Product Id:  DELL(tm)
Firmware Version: AA09
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sun Sep 29 00:03:33 2013 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
pradeepchhetri
  • 2,518
  • 6
  • 33
  • 45
  • Can you run `smartctl -i /dev/sda` on both hosts and provide the output? – Matthew Ife Sep 28 '13 at 17:42
  • Actually `smartctl -i /dev/sda` and `smartctl -i /dev/sdb` might be a good idea, you never know, perhaps b and a are different drives but B on one host is A on the other and vice versa. – Matthew Ife Sep 28 '13 at 17:47
  • @MIfe: I have updated with the smartctl output. – pradeepchhetri Sep 28 '13 at 18:38
  • Based off of your other results, I'm willing to bet that the 'bad' machine has more processes in its runqueue than the other. I.E the load on host A is in general higher than host B. – Matthew Ife Sep 28 '13 at 19:40

2 Answers2

3

The disk read numbers are within about 10 percent or so. I wouldn't worry about such a small difference. (The cached reads are not disk I/O and has nothing to do with your disks, or with I/O. See the hdparm man page for an explanation of why this is meaningless.)

David Schwartz
  • 31,215
  • 2
  • 53
  • 82
3

I think @DavidSchwartz has the right idea here, obviously the problem is somewhere else since the disk speeds look pretty similar.

The best resource I've seen for tracking down performance related issues is by using the USE method described by Brendan Gregg. Since you're using Linux, there is a related post also by him which is tailored specifically for Linux.

Paccc
  • 166
  • 3