1

I have dedicated server with 3 HDD. System disk, backup disk (same as system disk) and data disk. When I copy lot of data with cp (say, between backup disk and data disk) load average goes sky high.

For instance, the load average at the moment is around 0.57, when copying data it can go beyyond 50 or more.

Copying with rsync and with --bwlimit=10000 goes without a problem. Higher values cause high load.

File system is ext3.

sda - system disk:

% hdparm -Tt /dev/sda

/dev/sda:
 Timing cached reads:   13444 MB in  2.00 seconds = 6730.82 MB/sec
 Timing buffered disk reads:  232 MB in  3.02 seconds =  76.73 MB/sec

sdb - data disk:

% hdparm -Tt /dev/sdb

/dev/sdb:
 Timing cached reads:   13740 MB in  2.00 seconds = 6879.30 MB/sec
 Timing buffered disk reads:  430 MB in  3.00 seconds = 143.10 MB/sec

sdc - backup disk:

% hdparm -Tt /dev/sdc

/dev/sdc:
 Timing cached reads:   13796 MB in  2.00 seconds = 6907.75 MB/sec
 Timing buffered disk reads:  336 MB in  3.01 seconds = 111.45 MB/sec

iostat -x 1 (when not copying): http://pastebin.com/4WKU7YPa

iostat -x 1 (when copying: sdc > sdb): http://pastebin.com/fHafRCa8

% cat /sys/block/sda/queue/scheduler
noop anticipatory deadline [cfq]

Other two disks are "deadline" now, but they were "cfq" as well. Just tried to see if there will be any difference. There isn't.

Any operations that are more disk intensive are killing the server. If some process uses more memory and there is need for swapping, load goes very high. Sometimes I have to kill some service so load can go down. There were times when load went to 500 because of swapping.

Server has 4GB RAM and Xeon X3220 @ 2.40GHz. I can accept poor performance when there is not enough RAM, but just copying shouldn't kill the server. This just doesn't seams right.

Any idea what could be the problem? What else should I check? Could it be bad motherboard controller?

Added:

% fdisk -l

Disk /dev/sda: 500.1 GB, 500107862016 bytes
 255 heads, 63 sectors/track, 60801 cylinders
 Units = cylinders of 16065 * 512 = 8225280 bytes

    Device Boot      Start         End      Blocks   Id  System
 /dev/sda1   *           1          13      104391   83  Linux
 /dev/sda2              14        1318    10482412+  83  Linux
 /dev/sda3            1319        2623    10482412+  83  Linux
 /dev/sda4            2624       60801   467314785    5  Extended
 /dev/sda5            2624        3928    10482381   83  Linux
 /dev/sda6            3929        4189     2096451   82  Linux swap / Solaris
 /dev/sda7            4190       60670   453683601   83  Linux
 /dev/sda8           60671       60801     1052226   83  Linux

 Disk /dev/sdb: 2000.3 GB, 2000398934016 bytes 
 255 heads, 63 sectors/track, 243201 cylinders
 Units = cylinders of 16065 * 512 = 8225280 bytes

    Device Boot      Start         End      Blocks   Id  System
 /dev/sdb1               1      243201  1953512001   83  Linux

 Disk /dev/sdc: 500.1 GB, 500107862016 bytes
 255 heads, 63 sectors/track, 60801 cylinders
 Units = cylinders of 16065 * 512 = 8225280 bytes

    Device Boot      Start         End      Blocks   Id  System
 /dev/sdc1   *           1       60801   488384001   83  Linux

% cat /proc/scsi/scsi

 Attached devices:
 Host: scsi0 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: WDC WD5002ABYS-0 Rev: 02.0
   Type:   Direct-Access                    ANSI SCSI revision: 05
 Host: scsi0 Channel: 00 Id: 01 Lun: 00
   Vendor: ATA      Model: WDC WD2003FYYS-0 Rev: 01.0
   Type:   Direct-Access                    ANSI SCSI revision: 05
 Host: scsi1 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: WDC WD5002ABYS-0 Rev: 02.0
   Type:   Direct-Access                    ANSI SCSI revision: 05
Vald
  • 53
  • 1
  • 6
  • what does the kernel log say? calling `dmesg` should provide you with the log. It could be that there are bad sectors or one of the disks as problems... – drone.ah Feb 08 '13 at 12:30
  • Nothing in dmesg. Just a bunch of firewall massages like: Firewall: *UDP_IN Blocked* IN=eth0 OUT= MAC= and TCP: Treason uncloaked! Peer Nothing else. – Vald Feb 08 '13 at 12:50
  • Can you show the output of `fdisk -l` and `cat /proc/scsi/scsi`? – ewwhite Feb 08 '13 at 12:57
  • I added outputs of those commands to my question. – Vald Feb 08 '13 at 13:24
  • I have seen this before and if I remember correctly, it was a hardware fault. Try swapping the cables out with new ones and if that doesn't work, the controller (if it is different), then the MB. – drone.ah Feb 10 '13 at 10:53
  • Provider replaced the cables, but that didn't change anything, so they replaced server (except the HDD's), because controllers are onboard. Still same behavior. High load when copying. iotop shows kjournald among top processes. – Vald Feb 22 '13 at 16:21

1 Answers1

1

I think you are in the same case than I in Generating a lot of dirty pages is blocking synchronous writes

kjournald among top processes.

you are using a journaling file system like ext3 which apparently is causing synchronous writes to block.

You can try

Reducing the amount of dirty memory a process is able to create:

echo 100000000 > /proc/sys/vm/dirty_background_bytes
echo 200000000 > /proc/sys/vm/dirty_bytes 

The process doing the copy will not be able to write too much at once. It will write a piece of data and then ensure this data is flushed to disk before writing the next piece. It will ensure the journaling thread does not have too much to work with, and will still be able to work on requests from other processes while the copy is occurring.

Another thing you can try is doing the copy with dd and ensure you are writing synchronously:

dd if=foo of=bar bs=4096 oflag=sync

This will ensure also that blocks are written little by little.

And also if it matches your use case, you can remove journaling on your destination folder, if you understand the risk. You can remount your (I suppose ext3?) file system with the option

data=writeback

These are the stuff I tried with my system. This question has been opened 2 years ago, have you found a solution in the meantime?

freedge
  • 83
  • 1
  • 9
  • I was unable to find a proper solution so I switched to another dedicated server (with SSDs) and haven't got any problem since then. – Vald Jan 28 '16 at 15:43