Disk Cached IO very slow

Question

I have two boxes with exactly same hardware configuration. Both having RAID0 (created using mdadm) over SATA disks. But I am getting different cached reads while testing using hdparm command.

$ hdparm -tT /dev/sda

/dev/sda:
 Timing cached reads:   924 MB in  2.00 seconds = 462.20 MB/sec
 Timing buffered disk reads: 290 MB in  3.04 seconds =  95.44 MB/sec

While on the other box,

$ hdparm -tT /dev/sda

/dev/sda:
 Timing cached reads:   18404 MB in  2.00 seconds = 9201.42 MB/sec
 Timing buffered disk reads: 322 MB in  3.00 seconds = 107.18 MB/sec

Can someone help me in solving this issue. Why I am getting low cached reads on one of the server. Is there any BIOS settings handling this ?

Edit 1:

I tried perf tool on both the machines.

Output on the box where issue is coming:

# perf stat dd if=/dev/zero of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB) copied, 6.22039 s, 82.3 MB/s

Performance counter stats for 'dd if=/dev/zero of=/dev/null count=1000000':

    5692.357502  task-clock-msecs         #      0.913 CPUs
             72  context-switches         #      0.000 M/sec
              7  CPU-migrations           #      0.000 M/sec
            220  page-faults              #      0.000 M/sec
      975469183  cycles                   #    171.365 M/sec
     1374701843  instructions             #      1.409 IPC
          65350  cache-references         #      0.011 M/sec
          17986  cache-misses             #      0.003 M/sec

Output on the other box:

$ perf stat dd if=/dev/zero of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB) copied, 0.280017 s, 1.8 GB/s

 Performance counter stats for 'dd if=/dev/zero of=/dev/null count=1000000':

     278.388839  task-clock-msecs         #      0.994 CPUs
              0  context-switches         #      0.000 M/sec
              0  CPU-migrations           #      0.000 M/sec
            220  page-faults              #      0.001 M/sec
      725024593  cycles                   #   2604.359 M/sec
     1371073131  instructions             #      1.891 IPC
          15921  cache-references         #      0.057 M/sec
           1847  cache-misses             #      0.007 M/sec

I didn't understand why there are so many context switches and value of task-clock-msecs is also high. Can someone please help me in debugging further.

Edit 2:

I am getting the following output for smartctl command:

# /usr/local/sbin/smartctl -i /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.32-5-amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST91000640NS
Serial Number:    9XG40W61
LU WWN Device Id: 5 000c50 050920a25
Add. Product Id:  DELL(tm)
Firmware Version: AA09
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sun Sep 29 00:03:33 2013 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled


# /usr/local/sbin/smartctl -i /dev/sdb
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.32-5-amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST91000640NS
Serial Number:    9XG41K1L
LU WWN Device Id: 5 000c50 05093c434
Add. Product Id:  DELL(tm)
Firmware Version: AA09
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sun Sep 29 00:03:33 2013 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Can you run `smartctl -i /dev/sda` on both hosts and provide the output? — Matthew Ife, Sep 28 '13 at 17:42
Actually `smartctl -i /dev/sda` and `smartctl -i /dev/sdb` might be a good idea, you never know, perhaps b and a are different drives but B on one host is A on the other and vice versa. — Matthew Ife, Sep 28 '13 at 17:47
Based off of your other results, I'm willing to bet that the 'bad' machine has more processes in its runqueue than the other. I.E the load on host A is in general higher than host B. — Matthew Ife, Sep 28 '13 at 19:40

David Schwartz · Answer 1 · 2013-09-26T23:26:02.300

3

The disk read numbers are within about 10 percent or so. I wouldn't worry about such a small difference. (The cached reads are not disk I/O and has nothing to do with your disks, or with I/O. See the hdparm man page for an explanation of why this is meaningless.)

edited Sep 26 '13 at 23:26

answered Sep 26 '13 at 23:14

David Schwartz

31,215
2
53
82

But whenever i install a package in this box, it take too long to install. I am not able to figure out how to proceed ? – pradeepchhetri Sep 27 '13 at 04:35
You'll have to troubleshoot that issue. Your `hdparm` output might be a clue, but it's just a starting point. What does `iostat` say while the installation is going slowly? What does `top` say? Is the CPU maxed? – David Schwartz Sep 27 '13 at 06:14
iostat looks ok, not much disk % util .. no cpu iowait.. but the cpu is getting maxed to 100% when i installing some package. What can be the reason David ? – pradeepchhetri Sep 27 '13 at 07:06
Maybe the CPU is overheating. Check the heatsink, fan, and thermal paste. – David Schwartz Sep 27 '13 at 07:06
David i verified from DRAC , temperature, fans, power supplies everything is ok. – pradeepchhetri Sep 27 '13 at 07:15
Can you help me. I updated with some more information. – pradeepchhetri Sep 28 '13 at 17:47

score 3 · Answer 2 · answered Sep 27 '13 at 18:57

I think @DavidSchwartz has the right idea here, obviously the problem is somewhere else since the disk speeds look pretty similar.

The best resource I've seen for tracking down performance related issues is by using the USE method described by Brendan Gregg. Since you're using Linux, there is a related post also by him which is tailored specifically for Linux.

Disk Cached IO very slow

2 Answers2