I've got a Linux server that's running our bacula backup system. The machine is grinding like mad because it's going heavy in to swap. The problem is, it's only using 60% of its physical memory!

Here's the output from free -m:

free -m
             total       used       free     shared    buffers     cached
Mem:          3949       2356       1593          0          0          1
-/+ buffers/cache:       2354       1595
Swap:         7629       1804       5824

and some sample output from vmstat 1:

procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  2 1843536 1634512      0   4188   54   13  2524   666    2    1  1  1 89  9  0
 1 11 1845916 1640724      0    388 2700 4816 221880  4879 14409 170721  4  3 63 30  0
 0  9 1846096 1643952      0      0 4956  756 174832   804 12357 159306  3  4 63 30  0
 0 11 1846104 1643532      0      0 4916  540 174320   580 10609 139960  3  4 64 29  0
 0  4 1846084 1640272      0   2336 4080  524 140408   548 9331 118287  3  4 63 30  0
 0  8 1846104 1642096      0   1488 2940  432 102516   457 7023 82230  2  4 65 29  0
 0  5 1846104 1642268      0   1276 3704  452 126520   452 9494 119612  3  5 65 27  0
 3 12 1846104 1641528      0    328 6092  608 187776   636 8269 113059  4  3 64 29  0
 2  2 1846084 1640960      0    724 5948    0 111480     0 7751 116370  4  4 63 29  0
 0  4 1846100 1641484      0    404 4144 1476 125760  1500 10668 105358  2  3 71 25  0
 0 13 1846104 1641932      0      0 5872  828 153808   840 10518 128447  3  4 70 22  0
 0  8 1846096 1639172      0   3164 3556  556 74884   580 5082 65362  2  2 73 23  0
 1  4 1846080 1638676      0    396 4512   28 50928    44 2672 38277  2  2 80 16  0
 0  3 1846080 1628808      0   7132 2636    0 28004     8 1358 14090  0  1 78 20  0
 0  2 1844728 1618552      0  11140 7680    0 12740     8  763 2245  0  0 82 18  0
 0  2 1837764 1532056      0 101504 2952    0 95644    24  802 3817  0  1 87 12  0
 0 11 1842092 1633324      0   4416 1748 10900 143144 11024 6279 134442  3  3 70 24  0
 2  6 1846104 1642756      0      0 4768  468 78752   468 4672 60141  2  2 76 20  0
 1 12 1846104 1640792      0    236 4752  440 140712   464 7614 99593  3  5 58 34  0
 0  3 1846084 1630368      0   6316 5104    0 20336     0 1703 22424  1  1 72 26  0
 2 17 1846104 1638332      0   3168 4080 1720 211960  1744 11977 155886  3  4 65 28  0
 1 10 1846104 1640800      0    132 4488  556 126016   584 8016 106368  3  4 63 29  0
 0 14 1846104 1639740      0   2248 3436  428 114188   452 7030 92418  3  3 59 35  0
 1  6 1846096 1639504      0   1932 5500  436 141412   460 8261 112210  4  4 63 29  0
 0 10 1846104 1640164      0   3052 4028  448 147684   472 7366 109554  4  4 61 30  0
 0 10 1846100 1641040      0   2332 4952  632 147452   664 8767 118384  3  4 63 30  0
 4  8 1846084 1641092      0    664 4948  276 152264   292 6448 98813  5  5 62 28  0

Furthermore, the output of top sorted by CPU time seems to support the theory that swap is what's bogging down the system:

top - 09:05:32 up 37 days, 23:24,  1 user,  load average: 9.75, 8.24, 7.12
Tasks: 173 total,   1 running, 172 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.6%us,  1.4%sy,  0.0%ni, 76.1%id, 20.6%wa,  0.1%hi,  0.2%si,  0.0%st
Mem:   4044632k total,  2405628k used,  1639004k free,        0k buffers
Swap:  7812492k total,  1851852k used,  5960640k free,      436k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+    TIME COMMAND                                                                                                                             
 4174 root      17   0 63156  176   56 S    8  0.0   2138:52  35,38 bacula-fd                                                                                                                            
 4185 root      17   0 63352  284  104 S    6  0.0   1709:25  28,29 bacula-sd                                                                                                                            
  240 root      15   0     0    0    0 D    3  0.0 831:55.19 831:55 kswapd0                                                                                                                              
 2852 root      10  -5     0    0    0 S    1  0.0 126:35.59 126:35 xfsbufd                                                                                                                              
 2849 root      10  -5     0    0    0 S    0  0.0 119:50.94 119:50 xfsbufd                                                                                                                              
 1364 root      10  -5     0    0    0 S    0  0.0 117:05.39 117:05 xfsbufd                                                                                                                              
   21 root      10  -5     0    0    0 S    1  0.0  48:03.44  48:03 events/3                                                                                                                             
 6940 postgres  16   0 43596    8    8 S    0  0.0  46:50.35  46:50 postmaster                                                                                                                           
 1342 root      10  -5     0    0    0 S    0  0.0  23:14.34  23:14 xfsdatad/4                                                                                                                           
 5415 root      17   0 1770m  108   48 S    0  0.0  15:03.74  15:03 bacula-dir                                                                                                                           
   23 root      10  -5     0    0    0 S    0  0.0  13:09.71  13:09 events/5                                                                                                                             
 5604 root      17   0 1216m  500  200 S    0  0.0  12:38.20  12:38 java                                                                                                                                 
 5552 root      16   0 1194m  580  248 S    0  0.0  11:58.00  11:58 java

Here's the same sorted by virtual memory image size:

top - 09:08:32 up 37 days, 23:27,  1 user,  load average: 8.43, 8.26, 7.32
Tasks: 173 total,   1 running, 172 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.6%us,  3.4%sy,  0.0%ni, 62.2%id, 30.2%wa,  0.2%hi,  0.3%si,  0.0%st
Mem:   4044632k total,  2404212k used,  1640420k free,        0k buffers
Swap:  7812492k total,  1852548k used,  5959944k free,      100k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+    TIME COMMAND                                                                                                                             
 5415 root      17   0 1770m   56   44 S    0  0.0  15:03.78  15:03 bacula-dir                                                                                                                           
 5604 root      17   0 1216m  492  200 S    0  0.0  12:38.30  12:38 java                                                                                                                                 
 5552 root      16   0 1194m  476  200 S    0  0.0  11:58.20  11:58 java                                                                                                                                 
 4598 root      16   0  117m   44   44 S    0  0.0   0:13.37   0:13 eventmond                                                                                                                            
 9614 gdm       16   0 93188    0    0 S    0  0.0   0:00.30   0:00 gdmgreeter                                                                                                                           
 5527 root      17   0 78716    0    0 S    0  0.0   0:00.30   0:00 gdm                                                                                                                                  
 4185 root      17   0 63352  284  104 S   20  0.0   1709:52  28,29 bacula-sd                                                                                                                            
 4174 root      17   0 63156  208   88 S   24  0.0   2139:25  35,39 bacula-fd                                                                                                                            
10849 postgres  18   0 54740  216  108 D    0  0.0   0:31.40   0:31 postmaster                                                                                                                           
 6661 postgres  17   0 49432    0    0 S    0  0.0   0:03.50   0:03 postmaster                                                                                                                           
 5507 root      15   0 47980    0    0 S    0  0.0   0:00.00   0:00 gdm                                                                                                                                  
 6940 postgres  16   0 43596   16   16 S    0  0.0  46:51.39  46:51 postmaster                                                                                                                           
 5304 postgres  16   0 40580  132   88 S    0  0.0   6:21.79   6:21 postmaster                                                                                                                           
 5301 postgres  17   0 40448   24   24 S    0  0.0   0:32.17   0:32 postmaster                                                                                                                           
11280 root      16   0 40288   28   28 S    0  0.0   0:00.11   0:00 sshd                                                                                                                                 
 5534 root      17   0 37580    0    0 S    0  0.0   0:56.18   0:56 X                                                                                                                                    
30870 root      30  15 31668   28   28 S    0  0.0   1:13.38   1:13 snmpd                                                                                                                                
 5305 postgres  17   0 30628   16   16 S    0  0.0   0:11.60   0:11 postmaster                                                                                                                           
27403 postfix   17   0 30248    0    0 S    0  0.0   0:02.76   0:02 qmgr                                                                                                                                 
10815 postfix   15   0 30208   16   16 S    0  0.0   0:00.02   0:00 pickup                                                                                                                               
 5306 postgres  16   0 29760   20   20 S    0  0.0   0:52.89   0:52 postmaster                                                                                                                           
 5302 postgres  17   0 29628   64   32 S    0  0.0   1:00.64   1:00 postmaster

I've tried tuning the swappiness kernel parameter to both high and low values, but nothing appears to change the behavior here. I'm at a loss to figure out what's going on. How can I find out what's causing this?

Update: The system is a fully 64-bit system, so there should be no question of memory limitations due to 32-bit issues.

Update2: As I mentioned in the original question, I've already tried tuning swappiness to all sorts of values, including 0. The result is always the same, with approximately 1.6 GB of memory remaining unused.

Update3: Added top output to the above info.

Kamil Kisiel
  • 11,946
  • 7
  • 46
  • 68
  • 2
    It would appear that Linux is not using the page cache for anything, yet you've still got a large amount of free memory. Something is clearly amiss. – David Pashley Jun 08 '09 at 21:44
  • 1
    Can you post some additional Linux OS details? Vendor, release, kernel version, etc? There's a couple of tools I'd like to suggest, but some of them require a particular kernel version, or support library version. – Christopher Cashell Jun 10 '09 at 17:18

11 Answers11


You are I/O-bound. Your system is a little life raft, battered in a stormy sea of buffer/cache/VM paging swells that are 100 feet tall.

Wow. Just...wow. You're moving about 100Mbyte/sec out your I/O, you're deep past 50% CPU time in I/O wait, and you have 4Gb of RAM. The backpressure on this server's VM must be enormous. Under "normal" circumstances, as the system begins to buffer/cache, any free RAM you had is going to be eaten alive in less than 40 seconds.

Would it be possible to post the settings from /proc/sys/vm? This would provide some insight as to what your kernel thinks is "normal".

Those postmaster processes also indicate you're running PostgreSQL in the background. Is this normal for your setup? PostgreSQL in a default config will use very little RAM, but once it's re-tuned for speed, it can chew up 25%-40% of your available RAM quickly. So I can only guess, given the number of them in the output, you're running some kind of production database while you are running backups. This doesn't bode well. Can you give some more info on why it is running? What is the size of the shared memory parameter for all postmaster processes? Would it be possible to shut the service down, or temporarily reconfigure the database to use fewer connects/buffers while the backups are running? This will help to take some of the pressure off the already strained I/O and free RAM. Keep in mind that each postmaster process consumes RAM above and beyond what the database uses for internal caching. So when you make adjustments to memory settings, be careful about which are "shared" and which are "per-process".

If you're using PostgreSQL as part of your backup process, try to re-tune it to accept just the minimum number of connections, and be sure to shrink your per-process parameters down to something reasonable (only a few megs each). The downside to this is that PostgreSQL will spill to disk if it can't work with the dataset in RAM like it wants to, so that will actually increase your disk I/O, so tune carefully.

X11 in and of itself doesn't take much memory, but a full desktop session can consume several megs. Log out any active sessions you have and run your connection from the console or through SSH.

Still, I don't think it's entirely a memory issue. If you are better than 50% I/O wait for extended periods of time (and you're posting figures that touch the 70's), the resulting bottleneck will eventually crush the rest of the system. Much like Darth Vader crushes necks.

Someone on the business end of Darth Vader's death grip

How many flush threads are you configured for? Use

cat /proc/sys/vm/nr_pdflush_threads

to find out and

echo "vm.nr_pdflush_threads = 1" >> /etc/sysctl.conf

to set it to a single thread. Note that the last command makes it permanently load upon reboot. Seeing 1 or 2 in there is not unusual. If you have several cores or lots of spindle/bus capacity for I/O, you'll want to bump these (slightly). More flush threads = more I/O activity, but also more CPU time spent in I/O wait.

Is it default value, or have you bumped it? If you've bumped it, have you considered decreasing the number to reduce the amount of pressure on I/O ops? Or do you have a huge number of spindles and channels to work with, in which case, have you considered increasing the number of flush threads?

P.S. you want to set swappiness to the lower values, not the higher values, to prevent swap-out. Highest value = 100 = swap like crazy when it feels right, lowest value = 0 = try not to swap at all.

  • 178
  • 12
Avery Payne
  • 14,326
  • 1
  • 48
  • 87
  • I'll look at some of your suggestions. No, I'm not crazy and running a production database on the backup system. The PostgreSQL is part of the backup system, as Bacula uses that as its information store for keeping track of what is on what tape, etc. I'll take a look at tuning some of the parameters you specified. The high I/O throughput is a result of other servers dumping data to this server's disk tray, and this server subsequently pulling that data and writing it to an LTO4 tape library. – Kamil Kisiel Jun 10 '09 at 17:00
  • How are the server's disks arranged? Are you using a mirrored drive setup? – Avery Payne Jun 10 '09 at 20:31
  • 1
    +1 for purple prose :) – pjc50 Sep 15 '09 at 13:16
  • Yeah, I was feeling a bit creative that day. Sorry about the drama. :) – Avery Payne Sep 15 '09 at 14:12

If you look at the blocks read in per second (bi) under IO, it dwarfs the swap activity by multiple orders of magnitude. I don't think the swap usage is what's causing your disk thrashing, I think you have something running on the box that is simply causing a lot of disk activity (reads).

I'd investigate the applications running, and see if you can find the culprit.

Christopher Cashell
  • 8,999
  • 2
  • 31
  • 43
  • Well, as I said, it's running the bacula backup system. The blocks in are likely the result of the server dumping data to its externally attached SAS disk array. – Kamil Kisiel Jun 08 '09 at 22:25
  • 1
    Are you sure the disk is trashing from the swapping, and not the backups? What other processes are running on the box? If the kernel is new enough, there are some very useful tools out there (iotop) that can dig into the guts of io usage, and even set IO priority (ionice) if you're using the CFQ IO scheduler. – Christopher Cashell Jun 08 '09 at 22:54

See if this link answers some of your questions. I regularly see Linux paging (not swapping) out memory long before 60% utilization. This is an expected piece of its memory tuning:


But your lack of buffers/cache worries me. That looks very unusual. So I am thinking that something more is amiss.

  • Hey - good call - where's the buffers/cache? Are they turned off? Is something invalidating them constantly? – MikeyB Jun 09 '09 at 16:17

Can you try disabling swap entirely?

swapoff /dev/hdb2

or some such- at least that will validate that it's swapping that's your issue, and not something else.

Tim Howland
  • 4,678
  • 2
  • 26
  • 21
  • +1 to confirm that the presumed diagnosis is actually the cause of the issue. – Wayne Jun 09 '09 at 02:36
  • I will give this a try tomorrow at work. Also, my spaw isn't on /dev/hdb2 ;) – Kamil Kisiel Jun 09 '09 at 03:42
  • It should be noted though that, while being a good diagnosis help, this is very dangerous on a production system. If you really *need* the swap, you will quickly run out of RAM. And then the OOM killer will come and kill off a random process, which might just be your production DB... – sleske Jul 16 '09 at 22:47
  • Agreed- you should not be doing this anywhere near production. – Tim Howland Jul 17 '09 at 12:58

Bacula performance is highly database dependent. Likely, it's postgresql that's killing your server. The high load average and the fairly large % of cpu time spent in wait state clearly show it's waiting for Disk I/O... And that's PostgreSQL's doing. For every file in your backup set its doing at least an UPDATE statement. Don't worry about the swapping.

Do tune the PostgreSQL install. Possibly give individual database (or even tables) their own disks/raid sets to spread the I/O around. You can force PostgreSQL to use aynschronous writes if it isn't already... Although that's trading database integrity for write performance. Boost the hell out of the shared memory available to PostgreSQL. That will alleviate at least a lot of the read on the database. If you've never done it, run VACCUM ANALYZE on the Bacula database as well to give the query optimizer something to work with.

By far, Bacula's weakest point is the database dependencies (and the brain-deadness of some of it...) Run a purge of a recent large backup and notice how long (hours often) it takes to run a couple dozen million queries... Bacula likes comparatively few large files, otherwise it's a dog.


By default swappiness is set as 60.

cat /proc/sys/vm/swappiness 60

Swappiness is a kernel used to tweak how much the kernel favors swap over RAM; high swappiness means the kernel will swap out a lot, and low swappiness means the kernel will try not to use swap space.

We can change this editing the value of vm.swappiness in /etc/sysctl.conf.

  • 2,527
  • 15
  • 42
  • 65

You can manualy set the swappinness of the kernel, wich you can see at /proc/sys/vm/swappiness or issuing the command sysctl vm.swappiness. The swappiness is a kernel setting that determines how much the swap is used.

By setting sudo sysctl vm.swappiness=0 you are effectively deactivating the swap partition. To make this change permanent you can add/modify vm.swappiness=0 in /etc/sysctl.conf. You should see what is a good value for you. I personally have it configured to vm.swappiness=10, being 60 the default value.

  • 698
  • 1
  • 6
  • 13
  • Not quite, with swappiness=0 you're saying *never* swap if there's a way to avoid it, but still swap if the only other option is to fail an allocation or OOM kill. I find a swappiness of 30 is a nice improvement on laptops, but have not had a need to mess with it on other systems. – LapTop006 Jun 10 '09 at 08:58

Another thing you may want to look at is your kernel run queue and uninteruptable processes (the 'r' and 'b' columns in vmstat) are an indicator that the system is saturated at times. On a side note, don't confuse saturation with utilization... the real problem may be a starved process stack against the saturated kernel :-(

You could also run 'pmap -x [PID]' to get additional memory details from some of the more consuming processes. I wish you luck!


Matt Cummings
  • 181
  • 1
  • 1
  • 6

Maybe you have short-lived processes which use a lot of memory, then exit before you get a chance to notice them.

This would be consistent with what you're seeing anyway.

  • 2,898
  • 16
  • 13

Have you investigated issues with inode cache? slabtop should at least give you a starting point if you are running into something like this.

Martin M.
  • 6,428
  • 2
  • 24
  • 42

While your system is 64bit the system might not be able to actually address all of the available memory. This is a chipset limitation. For example, the previous generation Mac mini "supports" 4GB of ram but only 3.3GB was actually addressable.

  • 199
  • 3
  • It's an SGI Altix XE240, I'm *pretty* sure it can support more than 4 GB of RAM as I've used demo units with 32 GB. – Kamil Kisiel Jun 09 '09 at 21:51
  • 'tis not a chipset limitation in the old mini, that chipset can to 8GB, however Apple didn't add the addressing lines to handle it properly (IIRC, but there is a general case there among multiple manufacturers) – LapTop006 Jun 10 '09 at 08:59