Understanding RedHat's recommended tuned profiles

Question

We are going to roll out tuned (and numad) on ~1000 servers, the majority of them being VMware servers either on NetApp or 3Par storage.

According to RedHats documentation we should choose the virtual-guestprofile. What it is doing can be seen here: tuned.conf

We are changing the IO scheduler to NOOP as both VMware and the NetApp/3Par should do sufficient scheduling for us.

However, after investigating a bit I am not sure why they are increasing vm.dirty_ratio and kernel.sched_min_granularity_ns.

As far as I have understood increasing increasing vm.dirty_ratio to 40% will mean that for a server with 20GB ram, 8GB can be dirty at any given time unless vm.dirty_writeback_centisecsis hit first. And while flushing these 8GB all IO for the application will be blocked until the dirty pages are freed.

Increasing the dirty_ratio would probably mean higher write performance at peaks as we now have a larger cache, but then again when the cache fills IO will be blocked for a considerably longer time (Several seconds).

The other is why they are increasing the sched_min_granularity_ns. If I understand it correctly increasing this value will decrease the number of time slices per epoch(sched_latency_ns) meaning that running tasks will get more time to finish their work. I can understand this being a very good thing for applications with very few threads, but for eg. apache or other processes with a lot of threads would this not be counter-productive?

score 12 · Answer 1 · edited Apr 13 '17 at 12:14

Here's the schedule of tuned-adm configurations...

I think it helps to see them in tabular form. The main thing to note is that the default RHEL6 settings suck!! The other thing is that the enterprise-storage and virtual-guest profiles are identical except for reduced swappiness on the virtual guest side (makes sense, right?).

As for a recommendation on storage I/O elevator, you have a few layers of abstraction on the storage layer. Using the noop scheduler would make sense if you were using RDMs or presenting storage directly to your virtual machines. But since they're going to live on NFS or VMFS, I still like the additional tuning options afforded by the deadline scheduler.

Tuned profiles can be changed on-the-fly on running systems, so if you have any concerns, test with your application and specific environment and benchmark.

ok, thank you. Understand why you want `deadline` now :) – espenfjo Jun 26 '13 at 13:13 — espenfjo, Jun 26 '13 at 13:13

suprjami · Answer 2 · 2013-06-28T22:05:11.597

Have a watch of Shak and Larry's performance tuning videos from Summit, they talk about the tuned profiles in depth.

Part 1 - http://www.youtube.com/watch?v=fATEiBJ3pKw
Part 2 - http://www.youtube.com/watch?v=km-vLELmWLs

One of the biggest intended takeaways is that the profiles are only a recommended starting point, not immutable numbers which are magically perfect for every environment.

Start with one profile and have a play around with the settings. Generate a good production-like test workload and measure metrics which are important to your business.

Change one thing at a time and record every result at every iteration. When you're done, review the results and pick the settings that gave the best results. That's your ideal tuned profile.

Have a link to the Shak & Larry talk? – Aaron Copley Jun 28 '13 at 14:48 — Aaron Copley, Jun 28 '13 at 14:48
I've added video links to my answer. – suprjami Jun 28 '13 at 22:05 — suprjami, Jun 28 '13 at 22:05

score 7 · Accepted Answer · answered Jun 26 '13 at 10:05

The short answer is that any tuning is guesswork and only has value when backed up with empiricial data: Try it. Measure it. If you don't like it, tweak it.

A longer answer:

Increasing the dirty_ratio would probably mean higher write performance ...IO will be blocked for a considerably longer time

No. Increasing the dirty ratio means that your system is less likely to get into a state where it needs to start blocking on writes. The downside is that there's more memory used and greater risk of data loss in an outage.

meaning that running tasks will get more time to finish their work

Processes will usually yield before their time slice expires. The problem with a VM is that your machine may be competing for CPU and L1/L2 cache with other VMs - high levels of task switching (due to pre-empting) has a big impact on throughput. The kind of applications which are usualyl deployed into VMs are ones which are CPU bound (web servers, application servers).

Yes, the increase in throughput (which applies to all types of application) will come at the cost of an increase in latency - but the latter is of the order of microseconds when most transactions are taking milliseconds. If you need real time capability/very low latency then you shouldn't be using a VM.

For real. These are just guidelines. Tune to your taste. I still use the recommended "deadline" scheduler for VMs though. — ewwhite, Jun 26 '13 at 11:48
@ewwhite Why would you recommend `deadline` instead of NOOP on VMware with proper storage? — espenfjo, Jun 26 '13 at 12:09
@espenfjo Well, because RedHat recommends `deadline`... but also see my answer. — ewwhite, Jun 26 '13 at 12:42

Understanding RedHat's recommended tuned profiles

3 Answers3

Linked