It seems that CPU increases have outpaced disk speed for a while. Assuming a desktop or laptop with modern dual core Intel/AMD CPU and a single average SATA disk, would doing compression on most all of the disk give better overall performance? Basically does the reduced disk bandwidth more than make up for the increased CPU load? I'm sure the real answer is "it depends on what you're doing". By asking this question, I'm hoping to have someone who has done this pipe up and give some examples or pitfalls.

  • define performance? As in speed increase or space increase? You probably wouldn't notice any speed increase but would definitely find the spare bytes useful! :-p – Christopher Lightfoot Sep 03 '09 at 10:58

Yes, disk compression can provide better performance under particular circumstances:

  • Your application is disk throughput bound: modern CPUs and (de)compression algorithms can run at much higher bandwidth than modern disks in long transfers. Any reduction at all in the amount of data moving to or from disk platters is a win in this circumstance
  • It takes less time to (de)compress data that's going to disk platters than the difference in transfer times, and you have CPU cycles to spare

There's a reason both ZFS and Btrfs, both recent green-field designs, include provisions for compression.

In the HPC space, when an application is checkpointing from memory to disk, the CPUs are frequently not doing anything useful at all. This time is essentially pure overhead. Any use of the CPUs to reduce this time is a win.

Phil Miller
  • Media streaming disks are probably the only place where benefits happen as the chunk size is large enough. Standard OS disks will *always take a hit. – Ryaner Sep 03 '09 at 07:44
    Media streaming is *not* a compelling application for storage-system level compression. The data should already be compressed in a much better application-specific format. – Phil Miller Sep 03 '09 at 13:57

Disk compression will never give you better performance.

It may give you almost no penalty due to fast modern CPUs, but that's an entirely different thing.

You assume having to transfer less data from/to disk can improve performance; but big data transfers are almost never an I/O bottleneck: the real bottlenecks are seek time and latency. Modern hard disks are really fast on sustained data transfers with big files, what slows them down are little transfers from all over the disk.

Some scenarios:

  • Media files. Those are usually already compressed on their own (JPEG, MPEG, MP3), so compressing them at the filesystem level is not going to help at all; it will instead worsen things, because CPU resources are already needed to encode/decode them.
  • Databases. Those are usually read from/written to in little random bursts, so compressing them will not only have no benefit at all, but will also degrade performance, as the DBMS can't properly identify where on disk the physical data it needs to access are stored.
  • Pagefile. This is usually quite large, but the O.S. needs to address very small chunks of data on it, and needs to do that very precisely ("Read 4K at physical address X"); compressing it is usually not possible, but even if it was, it would be a complete waste of time and resources: it would provide almost zero compression, due to the "complete random data" nature of this file.
    So transferring less data from the disk provides no benefit? – kbyrd Sep 02 '09 at 23:03
  • Edited to answer that :-) – Massimo Sep 02 '09 at 23:07
    never is a very narrow minded word. Raw bandwidth from the disk and through the pci bus is often the bottleneck with some of the work I do. Compression can help performance a lot, especially if you have already taken measures to remove some of the other bottlenecks you mention – JamesRyan Sep 03 '09 at 09:22
  • Nitpick: The pagefile will not be "complete random data", it will probably contain machine code and application data, which should compress reasonably well. Of course, your point about direct addressing still stands. – sleske Sep 03 '09 at 09:24
    I'd also be hesitant to say "never". There may well be scenarios where disk bandwidth is the bottleneck. But you're probably correct that this is not the typical case. – sleske Sep 03 '09 at 09:25
    disk i/o is almost always a bottleneck in databases – Nick Kavadias Sep 03 '09 at 11:31
  • Compression may affect cache behaviour (e.g. BTRFS reading larger blocks from disk, a kind of prefetch side-effect), which may benefit or harm the performance, depends on the load. – Artem Oct 06 '13 at 16:41

There are specific situations that do this already at the per-application level, such as video compression - a system that couldn't read raw HD-quality video fast enough from a dsk can instead read compressed information and expand it using memory and CPU power. There is no reason this couldn't also be the case for other, specific situations but this can be best handled at the application level so the compression methods used are optimized to their purpose.

Keep in mind that the performance overhead of decompression is worthwhile if the entire throughput increases, so the idea shoudln't be dismissed out of hand - I don't think we're ready for general purpose performance boosting compression yet but it is theoretically possible to trade a resource you have excess of (CPU & memory) for a boost elsewhere (total data read from hard-drive)

You answered your own question! it depends is indeed the answer.

The best generalization i can make is:

If you have a database application which is disk read constrained, then yes! performance is better.

I don't think this is the case for most activities you'll be doing on a desktop/laptop.

In my domain (SQL Server) I know that reporting databases under heavy read loads can get better performance if compression is used. I know the same is true for mysql.

Microsoft have a white paper on their compression features in SQL Server 2008. Not exactly light reading unless your a DBA, but here's one chart that supports my generalization:

alt text

Nick Kavadias
The Microsoft Disk compression is ugly OLD. It is hardly comparable in ratios with ARJ method from 80's. But, even Microsoft's compression CAN provide better performance on very slow (laptop) hard drives. Especially if there's enough RAM for Write-caching and preventing excessive writes.

The write process is a weak spot of any random-access enabled compression method.

So, if you want compressed drive, you better move to some kind of Linux.

Disk compression is also very suitable for RAM-drives, no need to tell you why.

    Could you add some supporting data, maybe performance comparison between the Windows and Linux based solutions? – psarossy Jan 31 '13 at 07:17
  • Yeah, if you're going to bump a 3.5 year old thread, you'd better be bringing some new, hard facts. – MDMarra Jan 31 '13 at 13:04

CPU speeds have always been faster than disk speeds. IMHO, compression is going to increase overhead and thereby decrease performance.

  • but it depends on what you're doing :-) – Josh Sep 02 '09 at 23:02
  • How so? An increased overhead is an increased overhead. You can't buy money by spending money (unless it's counterfeit money, but that's another story). – Mark Henderson Sep 02 '09 at 23:04
  • The function of compressing and decompressing files, regardless of whether or not they're smaller due to the compression, is going to introduce performance overhead. When the file is read from disk into memory it has to be decompressed. When it's written from memory to disk it has to be compressed. – joeqwerty Sep 03 '09 at 00:01
    but if your cpu is sitting doing nothing and you disk bandwidth is the bottleneck, your cpu will end up doing more work but overall performance will increase. It really depends on what sort of data you are retreiving and what you are doing with it. – JamesRyan Sep 03 '09 at 09:18

I was reading somthing similar to this yesterday regarding OSX and it's compression of the filesystem - Basically the answer revolves around what you want to compress - in this example he's talking about the "FAT" data; file structures, properties, metadata etc that when stored together can be compressed to save space and be read into the cpu quicker than seeking the head all over the place to find the data for each file...

Anyway, worth a read if you're thinking about such things :-p

But compression isn't just about saving disk space. It's also a classic example of trading CPU cycles for decreased I/O latency and bandwidth. Over the past few decades, CPU performance has gotten better (and computing resources more plentiful—more on that later) at a much faster rate than disk performance has increased. Modern hard disk seek times and rotational delays are still measured in milliseconds. In one millisecond, a 2 GHz CPU goes through two million cycles. And then, of course, there's still the actual data transfer time to consider.

Granted, several levels of caching throughout the OS and hardware work mightily to hide these delays. But those bits have to come off the disk at some point to fill those caches. Compression means that fewer bits have to be transferred. Given the almost comical glut of CPU resources on a modern multi-core Mac under normal use, the total time needed to transfer a compressed payload from the disk and use the CPU to decompress its contents into memory will still usually be far less than the time it'd take to transfer the data in uncompressed form.

That explains the potential performance benefits of transferring less data, but the use of extended attributes to store file contents can actually make things faster, as well. It all has to do with data locality.

If there's one thing that slows down a hard disk more than transferring a large amount of data, it's moving its heads from one part of the disk to another. Every move means time for the head to start moving, then stop, then ensure that it's correctly positioned over the desired location, then wait for the spinning disk to put the desired bits beneath it. These are all real, physical, moving parts, and it's amazing that they do their dance as quickly and efficiently as they do, but physics has its limits. These motions are the real performance killers for rotational storage like hard disks.

The HFS+ volume format stores all its information about files—metadata—in two primary locations on disk: the Catalog File, which stores file dates, permissions, ownership, and a host of other things, and the Attributes File, which stores "named forks."

Extended attributes in HFS+ are implemented as named forks in the Attributes File. But unlike resource forks, which can be very large (up to the maximum file size supported by the file system), extended attributes in HFS+ are stored "inline" in the Attributes File. In practice, this means a limit of about 128 bytes per attribute. But it also means that the disk head doesn't need to take a trip to another part of the disk to get the actual data.

As you can imagine, the disk blocks that make up the Catalog and Attributes files are frequently accessed, and therefore more likely than most to be in a cache somewhere. All of this conspires to make the complete storage of a file, including both its metadata in its data, within the B-tree-structured Catalog and Attributes files an overall performance win. Even an eight-byte payload that balloons to 25 bytes is not a concern, as long as it's still less than the allocation block size for normal data storage, and as long as it all fits within a B-tree node in the Attributes File that the OS has to read in its entirety anyway.

There are other significant contributions to Snow Leopard's reduced disk footprint (e.g., the removal of unnecessary localizations and "designable.nib" files) but HFS+ compression is by far the most technically interesting.

From: http://arstechnica.com/apple/reviews/2009/08/mac-os-x-10-6.ars/3


Doubtfully. Compression and decompression involves more than just the disk and the CPU; in particular there will be a lot of transferring of data to and from memory (in addition to the standard transfer overhead without compression) which will really hurt in terms of page faults.

Maximus Minimus
In short, no, you probably won't gain in performance.

While compression will improve the performance of your storage, it will significantly degrade your processor speed. It probably comes down to what type of files you are going to be decompressing. If you are only dealing with word, excel and other basic filetypes then go ahead and compress them. If the individual files are bulkier, you're going to be sacrificing more of your time.

