How to tell if linux disk IO is causing excessive (> 1 second) application stalls

Question

I have a Java application performing a large volume (hundreds of MB) of continuous output (streaming plain text) to about a dozen files a ~~ext3~~ SAN filesystem. Occasionally, this application pauses for several seconds at a time. I suspect that something related to ~~ext3~~ vsfs (Veritas Filesystem) functionality (and/or how it interacts with the OS) is the culprit.

What steps can I take to confirm or refute this theory? I am aware of iostat and /proc/diskstats as starting points.

Revised title to de-emphasize journaling and emphasize "stalls"

I have done some googling and found at least one article that seems to describe behavior like I am observing: Solving the ext3 latency problem

Additional Information

Red Hat Enterprise Linux Server release 5.3 (Tikanga)
Kernel: 2.6.18-194.32.1.el5
Primary application disk is fiber-channel SAN: lspci | grep -i fibre >> 14:00.0 Fibre Channel: Emulex Corporation Saturn-X: LightPulse Fibre Channel Host Adapter (rev 03)
Mount info: type vxfs (rw,tmplog,largefiles,mincache=tmpcache,ioerror=mwdisable) 0 0
cat /sys/block/VxVM123456/queue/scheduler >> noop anticipatory [deadline] cfq

How about posting some of that iostat and diskstats goodness? Maybe posting what hardware you're using? Is this a virtual machine? Physical machine? Ambiguous questions get ambiguous and unhelpful answers. — Magellan, Nov 10 '12 at 17:44
@Adrian For realz... We have answers here that are all over the board because the basic information hasn't been provided. — ewwhite, Nov 10 '12 at 20:22
have you ruled out garbage collection? Several-seconds stalls are a common symptom of GC abusing. — Javier, Nov 11 '12 at 04:15
Obviously there are many things that can cause latency. Here, I'm focusing just on ext3 filesystem diagnosis. — noahz, Nov 11 '12 at 08:21
it looks more and more like nothing to do with disk I/O and everything with your app. — Javier, Nov 13 '12 at 15:00
Ok noted. I'm going to select the answer that had the most information about disk configuration / monitoring. — noahz, Nov 14 '12 at 03:13

score 4 · Answer 1 · answered Nov 10 '12 at 00:05

4

Well one easy test would be to mount that ext3 fs as ext2 and then profile the application's performance.

answered Nov 10 '12 at 00:05

EEAA

108,414
18
172
242

Yeah...no can do. Production environment. I need to troubleshoot the server in-place. – noahz Nov 10 '12 at 00:09
5

Well test it in your dev environment. – EEAA Nov 10 '12 at 00:09
Was hoping for a stat in /proc/diskstats or something similar that that can show % of time spent in journaling operations (or something similar). So that if/when I run this test, I can understand what's going on. – noahz Nov 10 '12 at 00:49
That wouldn't help since it wouldn't tell you if that time was spread out or concentrated. You need a test environment to test in. – David Schwartz Nov 10 '12 at 00:53
Apologies for not having more information when I originally posted this question. The filesystem is actually vxfs (Veritas Volume Manager). – noahz Nov 13 '12 at 01:13

score 4 · Answer 2 · edited Apr 13 '17 at 12:14

4

The answer is "Yes" (journaling ALWAYS adds latency :-)

The question of how significant it is can really only be answered by a direct test, but generally assume that for every (journaled) operation it takes around twice as long as it would without journaling enabled.

Since you mentioned in your comments on another answer that you can't do the direct test in your production environment (and presumably don't have a dev/test environment you can use) you do have one other option: Look at your disk statistics and see how much time you spend writing to the journal device.
Unfortunately this only really helps if your journal device is discrete and can be instrumented separately from the "main" disk.

Second time I'm plugging a McKusick video today, but if you wade through this video there's a great discussion of some of the work a journaling filesystem has to do (and the performance impact involved).
Not directly useful/relevant to you and your particular question, but a great general background on filesystems and journaling.

edited Apr 13 '17 at 12:14

Community

1

answered Nov 10 '12 at 01:26

voretaq7

79,345
17
128
213

Aside from incremental writes, what about pauses lasting several seconds? Any way to detect if / why this is happening? – noahz Nov 10 '12 at 02:43
@noahz you are asking for a list of infinite things (I can't eliminate enough possibilities to give you a reasonable guess - it could be disk bottlenecks, journaling, something in the JVM, something in the Java code, etc.) -- you can attach a debugger (like dtrace) to the process and/or kernel and watch what happens when it chokes, but that's about it... – voretaq7 Nov 10 '12 at 02:53
journalling adds (minimal) latency, but not 'stalls'. – Javier Nov 10 '12 at 18:03
Changed title to de-emphasize journaling and emphasize "stalls." – noahz Nov 12 '12 at 19:26

score 4 · Answer 3 · edited Apr 13 '17 at 12:14

Yes, journaling causes latency. But it's a small piece of the equation. I'd consider it the 5th or 6th item to look at... However, this is another in a trend of systems storage questions that do not include enough relevant information.

What type of server hardware are you using? (make and model)
Please describe the storage setup (RAID controller, cache configuration, number and arrangement of disks)
What operating system are you using? Distribution and kernel versions would be helpful.

Why do I ask for this information?

Your hardware setup and RAID level can have a HUGE impact on your observed performance. Read and write caching on hardware RAID controllers can and should be tuned to accommodate your workload and I/O patterns. The operating system matters because it impacts the tool recommendations and tuning techniques that would be helpful to you. Different distributions and kernels have different default settings, thus performance characteristics vary between them.

So in this case, there are a number of possibilities:

Your RAID array may not be able to keep up with the workload (not enough spindles).
Or you could benefit from write caching.
You may have fragmentation issues (how full is the filesystem?).
You could have an ill-fitting RAID level that's counter to the requisite performance characteristics.
Your RAID controller may need tuning.
You may need to change your system's I/O scheduler and run some block-device tuning.
You could consider a more performance-optimized filesystem like XFS.
You could drop the journal and remount your filesystems as ext2. This can be done on the fly.
You might have cheap SATA disks that may be experiencing bus timeouts.

But as-is, we don't have enough information to go on.

Hmm...it's SAN. Getting all that information will take time. Not readily available. — noahz, Nov 10 '12 at 05:49
I hadn't even considered the possibility of this running on a SAN. There are a [few things you can change](http://serverfault.com/questions/430955/openfiler-iscsi-performance/431112#431112), depending on your operating system distribution and version. Can you get that information? — ewwhite, Nov 10 '12 at 09:25

score 4 · Accepted Answer · answered Nov 10 '12 at 18:11

4

My guess is that there's some other process that hogs the disk I/O capacity for a while. iotop can help you pinpoint it, if you have a recent enough kernel.

If this is the case, it's not about the filesystem, much less about journalling. It's the I/O scheduler the responsible to arbitrate between conflicting applications. An easy test: check the current scheduler and try a different one. It can be done on the fly, without restarting. For example, on my desktop to check the first disk (/dev/sda):

cat /sys/block/sda/queue/scheduler
=>  noop deadline [cfq]

shows that it's using CFQ, which is a good choice for desktops but not so much for servers. Better set 'deadline':

echo 'deadline' > /sys/block/sda/queue/scheduler
cat /sys/block/sda/queue/scheduler
=>  noop [deadline] cfq

and wait a few hours to see if it improves. If so, set it permanently in the startup scripts (depends on distribution)

answered Nov 10 '12 at 18:11

Javier

9,078
2
23
24

Right, and the distribution, version, hardware details are all missing... However, we *do* know that it's a SAN, so the block device will be different. If current redhat/CentOS, there's a cleaner way to handle the I/O scheduler settings. – ewwhite Nov 10 '12 at 18:54
Yes, typically SAN devices have their own scheduler, and often mixing two are seriously non-optimal. Sometimes the best by far is to set to `noop`. – Javier Nov 10 '12 at 20:02
You don't know if that's the case here... ' – ewwhite Nov 10 '12 at 20:07
1

@ewwhite certainly no; but it's one of the few things that are easy and risk-free to test on production. I've seen a couple of cases when it saved the day, and also cases where it didn't make any perceptible difference. – Javier Nov 11 '12 at 02:01
but what id _do_ know is that writing the journal doesn't cause stalls. At worst, it makes every write a little longer (too little to be measurable in my experience). Usually it's totally masked by cache, either in the filesystem or the block device. (especially on SAN-attached devices) – Javier Nov 11 '12 at 02:04

score 2 · Answer 5 · answered Nov 10 '12 at 17:31

2

I had this issue on Redhat 4 with ext3 filesystems : many writes on a ext3 filesystem => big wait on anoter ext3 FS write

With access time update, read access can also be suspended => workaround : mount -o noatime

Regards, Jerome D.

answered Nov 10 '12 at 17:31

Jerome D

29
3

U. Windl · Answer 6 · 2019-04-25T11:06:27.377

You can try to move away from /proc/diskstats to /proc/meminfo: Maybe your write-back buffer grows that it requires flushing. We have had a situation when the write-back ("dirty") buffers were refilled faster than they could be written. Then Linux started more flush threads, making things worse. Limiting the allowed proportion of dirty buffers before the process is paused helped somewhat for the problem. The other hint I have is correlation: Capture the times when I/O is slow and then compare what else happened at the same time. You could try this for example:

while sleep 2
do
    (date; cat /proc/meminfo) >> /tmp/your_logfile
done

And compare when the times when your application seems slow.

score -1 · Answer 7 · answered Nov 13 '15 at 20:31

While this is not likely the solution for most people, I thought I'd mention this particular issue I've experienced before as well.

I have had significant I/O problems before when using WD Green drives with Linux Software RAID. Highly advised to use WD Red drives instead, if that's your problem. If you use Greens, as your drives age, your array will most likely become unbearably slow, as those drives constantly try to switch themselves off and on to save energy, causing HUGE I/O lag spikes. You'll eventually wear those drives out because they'll start racking up a huge load cycle count stat under S.M.A.R.T.

How to tell if linux disk IO is causing excessive (> 1 second) application stalls

7 Answers7