7

We have an Apache setup with a huge disk_cache (>500.000 entries, >50 GB disk space used). The cache grows by 16 GB every day.

My problem is that the cache seems to be growing nearly as fast as it's possible to remove files and directories from the cache filesystem!

The cache partition is an ext3 filesystem (100GB, "-t news") on an iSCSI storage. The Apache server (which acts as a caching proxy) is a VM. The disk_cache is configured with CacheDirLevels=2 and CacheDirLength=1, and includes variants. A typical file path is "/htcache/B/x/i_iGfmmHhxJRheg8NHcQ.header.vary/A/W/oGX3MAV3q0bWl30YmA_A.header".

When I try to call htcacheclean to tame the cache (non-daemon mode, "htcacheclean-t -p/htcache -l15G"), IOwait is going through the roof for several hours. Without any visible action. Only after hours, htcacheclean starts to delete files from the cache partition, which takes a couple more hours. (A similar problem was brought up in the Apache mailing list in 2009, without a solution: http://www.mail-archive.com/dev@httpd.apache.org/msg42683.html)

The high IOwait leads to problems with the stability of the web server (the bridge to the Tomcat backend server sometimes stalls).

I came up with my own prune script, which removes files and directories from random subdirectories of the cache. Only to find that the deletion rate of the script is just slightly higher than the cache growth rate. The script takes ~10 seconds to read the a subdirectory (e.g. /htcache/B/x) and frees some 5 MB of disk space. In this 10 seconds, the cache has grown by another 2 MB. As with htcacheclean, IOwait goes up to 25% when running the prune script continuously.

Any idea?

  • Is this a problem specific to the (rather slow) iSCSI storage?

  • Should I choose a different file system for a huge disk_cache? ext2? ext4?

  • Are there any kernel parameter optimizations for this kind of scenario? (I already tried the deadline scheduler and a smaller read_ahead_kb, without effect).

Josip Rodin
  • 1,575
  • 11
  • 17
flight
  • 384
  • 3
  • 14

2 Answers2

3

through my recent investigations, triggered by similar travails with htcacheclean, i have concluded that the main problem with cleaning large or deep caches, especially those that involve Vary headers, is an issue with the design of the utility itself.

based on poking around in the source code, and watching the output from strace -e trace=unlink, the general approach seems to be as follows:

  1. iterate over all top-level directories (/htcache/B/x/, above)
    • delete any .header and .data files for already-expired entries
    • gather the metadata for all nested entries (/htcache/B/x/i_iGfmmHhxJRheg8NHcQ.header.vary/A/W/oGX3MAV3q0bWl30YmA_A.header, above)
  2. iterate over all nested entry metadata and purge those with response time, .header modtime or .data modtime in the future
  3. iterate over all nested entry metadat and purge those that have expired
  4. iterate over all nested entry metadata to find the oldest; purge it; repeat

and any of the last three steps will return from the purging subroutine once the cache size has dropped below the set threshold.

so with a fast-growing and/or already-large cache, the rate of growth during the extended time required for step #1, can easily prove insurmountable even once you progress to steps #2-#4.

further compounding the problem, if you have not yet satisfied the size limits by the end of step #2, the fact that you have to iterate over all of the metadata for the nested entries to find the oldest, in order to only delete that single entry, then do the same thing all over again, means that the cache is again being allowed to grow faster than you will ever be able to trim it.

/* process remaining entries oldest to newest, the check for an emtpy
 * ring actually isn't necessary except when the compiler does
 * corrupt 64bit arithmetics which happend to me once, so better safe
 * than sorry
 */
while (sum > max && !interrupted && !APR_RING_EMPTY(&root, _entry, link)) {
    oldest = APR_RING_FIRST(&root);

    for (e = APR_RING_NEXT(oldest, link);
         e != APR_RING_SENTINEL(&root, _entry, link);
         e = APR_RING_NEXT(e, link)) {
        if (e->dtime < oldest->dtime) {
            oldest = e;
        }
    }

    delete_entry(path, oldest->basename, pool);
    sum -= oldest->hsize;
    sum -= oldest->dsize;
    entries--;
    APR_RING_REMOVE(oldest, link);
}

the solution?

obviously fast(er) disks would help. but it is not at all clear to me how much of an increase in IO throughput would be required to overcome the inherent problems in the current approach taken by htcacheclean. no dig against the creators or maintainers, but it sure seems like this design was either not tested against, or not ever expected to perform well against, broad, deep, fast-growing caches.

but what does seem to work, and i am still confirming right now, is to trigger htcacheclean from within a bash script that itself loops over the top-level directories.

#!/bin/bash

# desired cache size in integer gigabytes
SIZE=12;
# divide that by the number of top-level directories (4096),
# to get the per-directory limit, in megabytes
LIMIT=$(( $SIZE * 1024 * 1024 * 1024 / 4096 / 1024 / 1024 ))M;

while true;
do
  for i in /htcache/*/*;
  do
    htcacheclean -t -p$i -l$LIMIT;
  done;
done;

basically, this approach allows you to get to the purging steps(#2-#4) much more quickly and frequently, even if only for a small subset of entries. this means that you have a fighting chance of purging content at a rate faster than it is being added to the cache. again, it seems to be working for us, but i've only been testing it for a few days. and our cache targets and growth seem to be on par with yours, but ultimately your mileage may vary.

of course the main point of this posting is that maybe it will be helpful to someone else who stumbles across this question the same way that i did.

sherrard
  • 31
  • 1
2

10 secs for dir read sounds like you might not be using dir_index

check with

/sbin/tune2fs /dev/wherever | grep dir_index

how to turn on

tune2fs -O dir_index /dev/wherever

but this will only affect newly created dirs, to reindex everything run

e2fsck -D -f /dev/wherever
Aleksandar Ivanisevic
  • 3,327
  • 19
  • 24