We have processes doing background writes of big files. We would like those to have minimal impact on other processes.
Here is a test realised on SLES11 SP4. The server has massive memory, which allows it to create 4GB of dirty pages.
> dd if=/dev/zero of=todel bs=1048576 count=4096
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 3.72657 s, 1.2 GB/s
> dd if=/dev/zero of=zer oflag=sync bs=512 count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 16.6997 s, 0.0 kB/s
real 0m16.701s
user 0m0.000s
sys 0m0.000s
> grep Dirty /proc/meminfo
Dirty: 4199704 kB
This is my investigation so far:
- SLES11 SP4 (3.0.101-63)
- type ext3 (rw,nosuid,nodev,noatime)
- deadline scheduler
- over 120GB reclaimable memory at the time
- dirty_ratio is set to 40% and dirty_background_ratio 10%, 30s expire, 5s writeback
Here are my questions:
- having 4GB dirty memory at the end of the test, I conclude that the IO scheduler has not been called in above test. Is that right?
- since the slowness persists after the first dd finishes, I conclude this issue has also nothing to do with the kernel allocating memory or any "copy on write" happening when dd fills his buffer (dd is always writing from the same buf).
- is there a way to investigate deeper what is blocked? Any interesting counters to watch? Any idea on the source of the contention?
- we are thinking of either reducing the dirty_ratio values, either performing the first dd in synchronous mode. Any other directions to investigate? Is there a drawback in putting the first dd synchronous? I'm afraid that it will be prioritized over other "legits" processes doing asynchronous writes.
see also
https://www.novell.com/support/kb/doc.php?id=7010287
limit linux background flush (dirty pages)
http://yarchive.net/comp/linux/dirty_limits.html
EDIT:
there is an ext2 file system under the same device. On this device, there is no freeze at all! The only performance impact experienced occurs during the flushing of dirty pages, where a synchronous call can take up to 0.3s, so very far from what we experience with our ext3 file system.
EDIT2:
Following @Matthew Ife comment, I tried doing the synchronous write opening the file without O_TRUNC and you won't believe the result!
> dd if=/dev/zero of=zer oflag=sync bs=512 count=1
> dd if=/dev/zero of=todel bs=1048576 count=4096
> dd if=/dev/zero of=zer oflag=sync bs=512 count=1 conv=notrunc
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.000185427 s, 2.8 MB/s
dd was opening the file with parameters:
open("zer", O_WRONLY|O_CREAT|O_TRUNC|O_SYNC, 0666) = 3
changing with the notrunc option, it is now
open("zer", O_WRONLY|O_CREAT|O_SYNC, 0666) = 3
and the synchronous write completes instantly!
Well it is not completely satisfying for my use case (I'm doing an msync in this fashion. However I am now able to trace what write and msync are doing differently!
final EDIT: I can't believe I hit this: https://www.novell.com/support/kb/doc.php?id=7016100
In fact under SLES11 dd is opening the file with
open("zer", O_WRONLY|O_CREAT|O_DSYNC, 0666) = 3
and O_DSYNC == O_SYNC!
Conclusion:
For my usecase I should probably use
dd if=/dev/zero of=zer oflag=dsync bs=512 count=1 conv=notrunc
Under SLES11, running oflag=sync will really be running oflag=dsync no matter what strace is saying.