11

Problem

I want to enable background TRIM operations on a swap partition within a SSD disk on Linux. According to several articles, e.g. this one, the kernel detects this configuration and automatically performs discard operations, but on my tests seems that it’s not working although “discard” mount option is used to force this behavior.

Scenario

  • Debian Wheezy running Linux 3.2.0
  • SSD disk: 1 x 120GB OCZ Vertex 3 MI
  • 2GB swap “plain” partition, w/o other layers (LVM, RAID, etc.)

Background

These are the steps I follow to check if background TRIM is working on the swap partition:

  1. TRIM support: check if the SSD disk supports TRIM commands and the kernel flags the device as non-rotational:

    # hdparm -I /dev/sda | grep TRIM
     * Data Set Management TRIM supported (limit 1 block)
     * Deterministic read data after TRIM
    
    # cat /sys/block/sda/queue/rotational
    0
    
  2. Swap fill-up: mount the partition, clean all VM caches and configure Linux to swap aggressively setting vm.swappiness to 100. Then, run a script that allocates all the available memory and forces the kernel to start swapping:

    # swapon [--discard] /dev/sda2
    # echo 3 > /proc/sys/vm/drop_caches
    # echo 100 > /proc/sys/vm/swappiness
    # ./fill-up-memory.up
    

    The script runs a on server with 32GB of physical memory + 2GB swap partition and creates a ~33.8GB object in memory, that’s enough to fill-up all the memory and start swapping. This is an example of a script that achieves this behavior:

    #!/usr/bin/python
    
    mem = 33.8
    testing = 'A' * int(1024 * 1024 * 1024 * mem)
    raw_input()
    
  3. Check swap content: “swapon -s” shows that 100% of swap memory is used. Using “hdparm --read-sector” I check the raw-content of the swap partition sectors and all bytes are set to “4141”, the corresponding hexadecimal notation for the “A” character, everything works as expected. This is an example script to read sector-by-sector the content of the swap partition:

    #!/bin/bash
    
    for sector in `seq 194560 4100095` ; do
        hdparm --read-sector $sector /dev/sda
    done
    

NOTE: you can get the start/end sector of the swap partition using parted, cfdisk, etc.

When I stop the script it releases all the memory including the swap allocations, “swapon -s” returns no swap usage in the system. At this point, it’s expected that Linux starts discarding the content of the swap partition in background, but it doesn’t work, the content of the sectors is still “4141”, even several hours later.

I have made several tests and seems that Linux only performs a full discard when the partition is enabled using swapon() system call, but never in background, although “discard” mount options is enabled on /etc/fstab.

Further research: blkdev_issue_discard() is the kernel function in charge of sending TRIM commands to underlying SSD devices, there are two unique references to this function on mm/swapfile.c:

  • discard_swap() it’s called during swapon() process, if “discard” mount option is enabled it discards all the content, this works as expected.
  • discard_swap_cluster() it should discard the content of a cluster swap, but seems that it never performs a TRIM command.

Question: what is the expected behavior of Linux on swap + SSD devices? It should discard all free sectors/pages or only issue an initial full-discard when the partition is enabled during the boot up process? Thanks.

NetVicious
  • 462
  • 5
  • 17
santisaez
  • 201
  • 1
  • 3
  • 10
  • 4
    What's the point? RAM is cheap, as you're adequately proving by having 32 big ones in your server. Turn off Swap, use your SSD for something useful, and stop bitfricking about. – Tom O'Connor Aug 13 '13 at 14:49
  • 3
    **Swap can't be disabled** on those servers and they have an unique SSD disk, there's no option to host swap partition on a traditional HDD. I'm aware that putting swap on a SSD disk is not the best option, but I was wondering if I can achieve the same "discard" ext4 behavior on swap partitions, to improve disk performance as much as possible. – santisaez Aug 13 '13 at 15:10
  • 2
    This REALLY sounds like a case of premature optimization. – MikeyB Aug 20 '13 at 01:09
  • "Comments may only be edited for 5 minutes" - serves me right being on SF while at work....as I was saying; @MikeyB Actually, I've been reading up on this. The wikipedia article mentioned something I wasn't aware of. "Due to the nature of flash memory's operation, data cannot be directly overwritten as it can in a hard disk drive." So it would make sense that the previously used blocks in swap would be empty....but would those look like "0000" when santisaez checks the swap contents? – Signal15 Aug 23 '13 at 13:53
  • That all happens at a layer below the operating system. As far as the OS is concerned, the data on a block is there until it gets rewritten. It's the drive's responsibility to handle the read-erase-write cycle. – MikeyB Aug 23 '13 at 14:12
  • @MikeyB True, as I read it, that's what should happen. But when santisaez does a swap-fill & sector-check he's finding "4141" instead of whatever-would-be-there-after-an-erase-op. The kicker is, I don't have an SSD I can test this on. (Because I would *sooooo* test this out. This is a good question.) – Signal15 Aug 23 '13 at 16:47

3 Answers3

1

It seems that discard_swap_cluster is only called from scan_swap_map which in turn is called from get_swap_page or get_swap_page_of_type. So if I'm correct, the discarding only happens when a new swap page is going to be allocated, not when a page is freed.

lav
  • 341
  • 3
  • 7
1

It could be that your system has --discard=once as default. Have you tried mounting with a specific discard option?

# nano /etc/fstab
________________________________________________________________
...
/dev/sda2    none    swap    ..., --discard=pages,...    ...
...

and forcing like this:

# swapon --discard=pages /dev/sda2

You could also try to make a fstrim service, or configure it if it's already available.

kgizdov
  • 205
  • 1
  • 2
  • 4
-1

When I stop the script it releases all the memory including the swap allocations, “swapon -s” returns no swap usage in the system. At this point, it’s expected that Linux starts discarding the content of the swap partition in background, but it doesn’t work, the content of the sectors is still “4141”, even several hours later.

The contents of swap are effectively 'discarded" when swapon -s returns "no swap used". The system is not going to overwrite the contents of the blocks (filled w/ "4141") because it's an SSD and excessive writes would shorten the life of the SSD. (At least, that's what I take away from the documentation)

Signal15
  • 943
  • 7
  • 27
  • 5
    If `discard` mount option is used TRIM commands should be sent to the underlying solid-state drive to avoid [write amplification](http://en.wikipedia.org/wiki/Write_amplification) issue on SSD disks. At least, this is the way in which other filesystems, like ext4. – santisaez Aug 23 '13 at 11:53
  • To be clear, that would indeed result in reading only zeroes with that hdparm command, but only after the SSD's garbage collector had a chance to run.. – Halfgaar Dec 28 '14 at 21:18