Running postfix on ubuntu, sending alot of mail ( ~ 1 million messages ) per day. loads are extremly high but not much in terms of cpu and memory load. Anyone in a similiar situation and know how to remove the bottleneck?

All mail on this server is outbound.

I would have to assume the bottleneck is disk.

Just an update, here is what iostat looks like:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.12   99.88    0.00    0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00    12.38    0.00    2.48     0.00   118.81    48.00     0.00    0.00   0.00   0.00
sdb               1.49    22.28   72.28   42.57   629.70  1041.58    14.55   135.56  834.31   8.71 100.00

Are these numbers in line with the performance you would expect from a single disk?

sdb is dedicated to postfix.

I think it is queue shuffling, from incoming->active->deferred

More details from questions:

Server: Quad core Xeon(R) CPU E5405 @ 2.00GH with 4 GB ram

Load average: 464.88, 489.11, 483.91, 4 cores. but the memory utilization and cpu is minimal

Postfix instances between 16 - 32

  with 400+ load i'm surprized the systems doing anything, if your sending OUT 1 million messages a day through 1 system, i would suggest definately to improve your disk IO (Ramdisk, Raid), and probably move to a more clustered option, I'm sure at 400 load your server's moving mail quite slowly.
  • @Brian G: You can flag a comment, but I don't think you can delete it. I agree with him, though. – womble Jul 10 '09 at 00:47

This may sound a bit crazy, but you should:

  1. Turn down logging to the bare minimum you need. Make syslog only log mail.err or higher.
  2. Add more RAM. Yes, Postfix doesn't need it, but extra RAM means extra page cache for the kernel.
  3. You didn't mention what filesystem is on /dev/sdb (which matters some too), but definitely switch it over to noatime, which should reduce the load at least a little bit.
  4. See how big your /var/spool/postfix is. If it's under a couple gig, consider moving it to a ramdisk.
  • Couldn't have said it better myself. I noticed 3. as well, sda and sdb with no partitions could be causing some slowdown, or at least its not an efficient use of the disks in the system. – grufftech Jul 09 '09 at 23:03
  • Nevermind -- i'm retarded, looks like its a iostat -x instead of just a iostat. my mistake! – grufftech Jul 09 '09 at 23:05
  • There shouldn't be any reason to try and reduce the amount of logging, as long as you have syslog logging asynchronously and (preferably) have the logs and spool on different spindles. Do make sure you're not doing any verbose logging for normal operation, though. – Rob Chanter Jul 10 '09 at 03:58

I have to disagree with those that have suggested using a RAM disk for "/var/spool/postfix". This means that your entire mail queue will be stored in RAM. If your server crashes, or loses power, messages in the queue are gone forever. This is really bad from the client/user perspective because the message has already been successfully accepted for delivery. Worse, your server will not send a notice stating that an email bounced or couldn't be delivered because the queue will be empty when the server comes back up.

Instead, I'd add as many fast disks as you can afford; I can't really estimate how many you'll need with the information given. From the "iostat" output above, it looks like you're doing ~ 120 IOPS to 'sdb' (sum of r/s and w/s). You can reasonably estimate that a single 15k RPM SCSI or FC disk will handle 150 IOPS. I would start with 5 15k RPM SCSI disks and a decent RAID controller. Set it up as RAID-10 across 4 drives with 1 hot spare. I'm not sure that this will completely solve your problem, but it definitely won't make it worse.


Run postfix under some profiler (gprof?), or look in the logs. Postfix logs a lot of timing information that might tell you where the hold up is. Common places to look are:

  1. Disk performance. Might be time for RAID-10 for your queue.
  2. Any kind of network IO on messages. DNS blacklists? SAV?
  3. Milters and other filters you've installed.
  4. Authentication and UID lookups being done over the network or to a process (ldap, sql).
  5. not using proxy: for slow maps (like the above)
  • use something like `iostat -x -v 3` to check disk utilization. – moshen Jul 09 '09 at 17:47
  • with the iostat -x, its definately disk performance, lol, 100% Util on the disk. – grufftech Jul 09 '09 at 23:09
  • Go out and buy 4 15k SAS drives if your machine will take them, or 4 Velociraptor SATA drives if no SAS. RAID-10 them, mount as the postfix queue. If that doesn't do it, look into the Intel SSDs, but your world is going to be expensive pain at that point. – Bill Weiss Jul 10 '09 at 03:41

A million messages a day is about 11 per second, assuming throughput is constant. Postfix by itself should be able to handle at least an order of magnitude greater than that on entry-level server hardware. So I suspect you have more than just postfix running, or very unevenly distributed throughput peaks.

Your situation certainly looks like a heavily I/O-bound server. This is to be expected with an MTA, which needs to make lots of small writes to guarantee that it won't lose mail.

Take time to tune I/O on both /var/spool/postfix and /var/log. Best practice for busy postfix servers is to separate the two across different spindles, and to make sure that asynchronous logging is enabled. prefix the logfile name for your mail log with a dash on Linux.

mail.info                              -/var/log/mail.log

or similar.

If you're using amavisd-new, make sure its work area is on a tmpfs filesystem. We usually put it on /tmp/vscan/. This is safe, since amavisd-new doesn't return an end-of-data response until the downstream (post-filter) hop has accepted the message.

Some people recommend noatime mount options for the postfix spool. This is potentially unwise, due to the way postfix depends on file system semantics. See for example http://archives.neohapsis.com/archives/postfix/2006-01/1916.html.

It definitely looks like your disk subsystem should at least be looked at as part of the problem. Due to the way postfix shuffles files around /var, I would suggest googling for "tweak ext3 filesystem" (at least setting noatime and writeback) to see if you can't boost performance at the filesystem level.

I have two clusters of servers that double duty DNS and outbound SMTP for customer-destined email and run 250k messages daily (2k-10k/hour) with nowhere near that sort of I/O bindup.

looks like you've got a dodgy disk. Your server only doing 72 read request/sec & 42 write/second. My seagate 7200 RPM desktop HDD can do 100+ random read/write request per second and still cope with it.

Try mounting the spool on sda and see if the load get any better.

But before you splash more money on disk , do the following :

  1. Run qshape active, qshape deferred, and qshape incoming and let us know the total of each command.

    Unusually high number of mail in deferred queue means your mail server might be used by spammer to relay their spam ( eg sending email to inexistent domain which will cause your postfix to retry again and again ).

  2. Make sure your mail server is not blacklisted (http://www.mxtoolbox.com/blacklists.aspx )

  3. Check DNS response time & Run a local DNS cache.

    Mail server use DNS quite heavily. Do dig somedomain.com mx Run it over few different hosts. Generally response time should be less than 100 - 400ms. If you get higher response your DNS may not performing well. Try different DNS ( you could try google's or OpenDNS : )

  4. Check your network. ( eg ifconfig ) and see how many error packets. Check if your link is saturated or shaped. Check if there were any high number of time out operation on mail logs. Do tcpdump and make sure packets are not getting lost or re-transmitted.

  5. Can you tell us if the console is responsive ( eg when you type some command how fast does the system give you feedback) ?

    Generally network issue ( eg DNS) will cause the load to skyrocket, but the system is still responsive.

or start with

vmstat 1

"iostat 1" suggested by moshen is also good

from your stats clearly faster disk subsystem would be nice. raid-10 on 6-8 15k rpm disks maybe with some cache, couple of gigs of memory on-board.

mount your spool directory with noatime,nodiratime options. consider tuning or changing your file-system to handle plenty of small [ i assume ] files.

How many cores in the box, and what is the actual load? What is the actual rate you're getting messages sent out?

Like most, my first thought is disk, so check that.

However, network utilization might be the cause, as may be high interrupt load (bad card?), so check those. I've found that even for a modest mail server, having a fast caching DNS server (I'm partial to "unbound") on the same box helps to alleviate latency and network load.

  load average: 464.88, 489.11, 483.91, 4 cores. but the memory utilization and cpu is minimal.
  Ouch. How many postfix procs do you have running at any given time? Maybe tuning down the number of processes running at once will ease up on the disk i/o contention a bit. Fewer procs, but each one can go a little faster. That, or some other Postfix throttling mechanism, like limiting the load cut-off to something reasonable.
  16-32 postfix instances.
  • 3
    4xx load average isn't "extremely high", it's "my server is hosed" :)

Looks like a storage performance bottle neck to me.

The iowait of 99.88 tells you that your system is spending a lot of time waiting on your storage.

I agree with Bill Weiss. You should look into a raid10 setup for the queue.

You really need to get a faster disk, or preferably move to a raid solution. What sort of server is this?


If you are running amavis for spam+virus filtering, you should increase the number of concurrent amavis processes. According to your setup, you may need to increase both the numbers of smtp-amavis processes from postfix master.cf, and also the relevant setting in amavis.conf.

with you doing 630 reads and 1042 writes per second, I definitely suggest bumping up your memory in the system (to better handle the OS & a ram drive) and then making your postfix folder a ramdisk.

Would also suggest putting your mail logs on their own partition if not their own disk entirely.

This isn't an IO problem, it's a postfix configuration problem. You're asking it to do too much all at once and creating a bottleneck for yourself. Check out the postfix performance tuning readme and/or post your main.cf so we can help.

