Can you post iostat -xdk 1 50 when the problem occurs. See the man page of iostat on what switch you can use to get partition names listed). Pastebin it along with a top poutput taken at the same time.
Okay, so here when your disk seems to become too loaded at certain times in your workload.
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 85.00 5.00 249.00 11.00 6040.00 64.00 46.95 10.73 44.23 3.85 100.00
sda 3.00 0.00 275.00 0.00 7764.00 0.00 56.47 7.63 23.27 3.64 100.00
sda 125.00 29.00 221.00 3.00 5508.00 128.00 50.32 7.49 41.08 4.46 100.00
sda 14.00 65.00 224.00 28.00 5940.00 372.00 50.10 1.97 8.05 3.52 88.80
Comparing with the other iterations, the read request becomes too sporadically large at times. And then await increases. However, the average queue size noted in avgqu-sz is still pretty low. That means, most of the await time is spent while the storage is servicing the requests. It is not in the linux side, I mean not on the scheduler side.
Roughly speaking, there are two queues. One in the scheduler and the other in hardware side. await time is measured on the basis of each IO from the time it hits the IO scheduler to the time when it is serviced by the storage i.e. disk. avgqu-sz is the average number of IO contained within both the IO scheduler and in the storage lun queue. If the avgqu-sz is much less than the queue depth of the storage, that means little time is spent in the scheduler queue. Scheduler will pass those IOs to the storage and until they are serviced by the storage, the await time will keep increasing.
Long story short, in my opinion, at particular times, the storage is becoming slow and that increases the latency.