Btrfs
From Btrfs Wiki:
- Btrfs is a modern copy on write (CoW) filesystem for Linux aimed at implementing advanced features while also focusing on fault tolerance, repair and easy administration. Jointly developed at multiple companies, Btrfs is licensed under the GPL and open for contribution from anyone.
Preparation
For user space utilities, install the btrfs-progs package that is required for basic operations.
If you need to boot from a Btrfs file system (i.e., your kernel and initramfs reside on a Btrfs partition), check if your boot loader supports Btrfs.
File system creation
The following shows how to create a new Btrfs file system. To convert an ext3/4 partition to Btrfs, see #Ext3/4 to Btrfs conversion. To use a partitionless setup, see #Partitionless Btrfs disk.
See mkfs.btrfs(8) for more information.
File system on a single device
To create a Btrfs filesystem on partition /dev/partition
:
# mkfs.btrfs -L mylabel /dev/partition
The Btrfs default nodesize for metadata is 16KB, while the default sectorsize for data is equal to page size and autodetected. To use a larger nodesize for metadata (must be a multiple of sectorsize, up to 64KB is allowed), specify a value for the nodesize
via the -n
switch as shown in this example using 32KB blocks:
# mkfs.btrfs -L mylabel -n 32k /dev/partition
Multi-device file system
Multiple devices can be used to create a RAID. Supported RAID levels include RAID 0, RAID 1, RAID 10, RAID 5 and RAID 6. Starting from kernel 5.5 RAID1c3 and RAID1c4 for 3- and 4- copies of RAID 1 level. The RAID levels can be configured separately for data and metadata using the and options respectively. By default, the data has one copy (single
) and the metadata is mirrored (raid1
). This is similar to
creating a JBOD configuration, where disks are seen as one filesystem, but files are not duplicated. See Using Btrfs with Multiple Devices for more information about how to create a Btrfs RAID volume.
# mkfs.btrfs -d single -m raid1 /dev/part1 /dev/part2 ...
You must include either the hook or the hook in in order to use multiple Btrfs devices in a pool. See the Mkinitcpio#Common hooks article for more information.
See #RAID for advice on maintenance specific to multi-device Btrfs file systems.
Configuring the file system
Copy-on-Write (CoW)
By default, Btrfs uses copy-on-write for all files all the time. Writes do not overwrite data in place; instead, a modified copy of the block is written to a new location, and metadata is updated to point at the new location. See the Btrfs Sysadmin Guide section for implementation details, as well as advantages and disadvantages.
Disabling CoW
To disable copy-on-write for newly created files in a mounted subvolume, use the mount option. This will only affect newly created files. Copy-on-write will still happen for existing files. The option also disables compression. See for details.
To disable copy-on-write for single files/directories, do:
$ chattr +C /dir/file
This will disable copy-on-write for those operation in which there is only one reference to the file. If there is more than one reference, e.g. due to file clones / lightweight clones or filesystem snapshots, copy-on-write still occurs. Note that as of coreutils 9.0, cp
attempts to perform lightweight copies by default — see for more details.
Compression
Btrfs supports transparent and automatic compression. This reduces the size of files as well as significantly increases the lifespan of flash-based media by reducing write amplification. See Fedora:Changes/BtrfsByDefault#Compression, , and . It can also improve performance, in some cases (e.g. single thread with heavy file I/O), while obviously harming performance in other cases (e.g. multi-threaded and/or CPU intensive tasks with large file I/O). Better performance is generally achieved with the fastest compress algorithms, zstd and lzo, and some benchmarks provide detailed comparisons.
LZO has a fixed compression level, whereas ZLIB & ZSTD have a range of levels from 1 (low compression) to 9 (ZLIB) or 15 (ZSTD). Changing the levels will affect CPU and I/O throughput differently, so they should be checked / benchmarked before & after changing.
The mount option enables automatically considering every file for compression, where is either , , zstd
, or (for no compression). Using this option, btrfs will check if compressing the first portion of the data shrinks it. If it does, the entire write to that file will be compressed. If it does not, none of it is compressed. With this option, if the first portion of the write does not shrink, no compression will be applied to the write even if the rest of the data would shrink tremendously. This is done to prevent making the disk wait to start writing until all of the data to be written is fully given to btrfs and compressed.
The compress-force=alg[:level]
mount option can be used instead, which makes btrfs skip checking if compression shrinks the first portion, and enables automatic compression try for every file. In a worst-case scenario, this can cause (slightly) more CPU usage for no purpose. However, empirical testing on multiple mixed-use systems showed a significant improvement of about 10% disk compression from using over just , which also had 10% disk compression.
Only files created or modified after the mount option is added will be compressed.
To apply compression to existing files, use the command, where is either , or zstd
. For example, in order to re-compress the whole file system with , run the following command:
# btrfs filesystem defragment -r -v -czstd /
To enable compression when installing Arch to an empty Btrfs partition, use the option when mounting the file system: . During configuration, add compress=zstd
to the mount options of the root file system in fstab.
- Systems using older kernels or btrfs-progs without
zstd
support may be unable to read or repair your filesystem if you use this option. - GRUB introduced zstd support in 2.04. Make sure you have actually upgraded the bootloader installed in your MBR/ESP since then, by running
grub-install
with the appropriate options for your BIOS/UEFI setup, since that is not done automatically. See FS#63235.
View compression types and ratios
takes a list of files (or an entire btrfs filesystem) and measures compression types used and effective compression ratios. Uncompressed size may not match the number given by other programs such as , because every extent is counted once, even if it is reflinked several times, and even if a part of it is no longer used anywhere but has not been garbage collected. The option keeps it on a single filesystem, which is useful in situations like to avoid it from attempting to look in non-btrfs subdirectories and fail the entire run.
Subvolumes
"A btrfs subvolume is not a block device (and cannot be treated as one) instead, a btrfs subvolume can be thought of as a POSIX file namespace. This namespace can be accessed via the top-level subvolume of the filesystem, or it can be mounted in its own right."
Each Btrfs file system has a top-level subvolume with ID 5. It can be mounted as (by default), or another subvolume can be mounted instead. Subvolumes can be moved around in the filesystem and are rather identified by their id than their path.
See the following links for more details:
Creating a subvolume
To create a subvolume:
# btrfs subvolume create /path/to/subvolume
Listing subvolumes
To see a list of current subvolumes and their ids under :
# btrfs subvolume list -p path
Deleting a subvolume
To delete a subvolume:
# btrfs subvolume delete /path/to/subvolume
Since Linux 4.18, one can also delete a subvolume like a regular directory (, ).
Mounting subvolumes
Subvolumes can be mounted like file system partitions using the subvol=/path/to/subvolume
or mount flags. For example, you could have a subvolume named and mount it as . One can mimic traditional file system partitions by creating various subvolumes under the top level of the file system and then mounting them at the appropriate mount points. It is preferable to mount using subvol=/path/to/subvolume
, rather than the subvolid, as the subvolid may change when restoring #Snapshots, requiring a change of mount configuration.
/
(which is done by default). Instead, consider creating a subvolume to house your actual data and mounting it as /
.See Snapper#Suggested filesystem layout, Btrfs SysadminGuide#Managing Snapshots, and Btrfs SysadminGuide#Layout for example file system layouts using subvolumes.
See for a full list of btrfs-specific mount options.
Mounting subvolume as root
To use a subvolume as the root mountpoint, specify the subvolume via a kernel parameter using . Edit the root mountpoint in and specify the mount option . Alternatively, the subvolume can be specified with its id, as kernel parameter and as mount option in . It is preferable to mount using subvol=/path/to/subvolume
, rather than the subvolid, as the subvolid may change when restoring #Snapshots, requiring a change of mount configuration, or else the system will not boot.
Changing the default sub-volume
The default sub-volume is mounted if no mount option is provided. To change the default subvolume, do:
# btrfs subvolume set-default subvolume-id /
where subvolume-id can be found by listing.
Changing the default subvolume with will make the top level of the filesystem inaccessible, except by use of the or subvolid=5
mount options .
Quota
Quota support in Btrfs is implemented at a subvolume level by the use of quota groups or qgroup: Each subvolume is assigned a quota groups in the form of 0/subvolume_id by default. However, it is possible to create a quota group using any number if desired.
To use qgroups, you need to enable quota first using
# btrfs quota enable path
From this point onwards, newly created subvolumes will be controlled by those groups. In order to retrospectively enable them for already existing subvolumes, enable quota normally, then create a qgroup (quota group) for each of those subvolume using their subvolume_id and rescan them:
# btrfs subvolume list path | cut -d' ' -f2 | xargs -I{} -n1 btrfs qgroup create 0/{} path # btrfs quota rescan path
Quota groups in Btrfs form a tree hierarchy, whereby qgroups are attached to subvolumes. The size limits are set per qgroup and apply when any limit is reached in tree that contains a given subvolume.
Limits on quota groups can be applied either to the total data usage, un-shared data usage, compressed data usage or both. File copy and file deletion may both affect limits since the unshared limit of another qgroup can change if the original volume's files are deleted and only one copy is remaining. For example, a fresh snapshot shares almost all the blocks with the original subvolume, new writes to either subvolume will raise towards the exclusive limit, deletions of common data in one volume raises towards the exclusive limit in the other one.
To apply a limit to a qgroup, use the command . Depending on your usage, either use a total limit, unshared limit () or compressed limit (-c
).
To show usage and limits for a given path within a filesystem, use
# btrfs qgroup show -reF path
Commit interval
The resolution at which data are written to the filesystem is dictated by Btrfs itself and by system-wide settings. Btrfs defaults to a 30 seconds checkpoint interval in which new data are committed to the filesystem. This can be changed by appending the mount option in for the btrfs partition.
LABEL=arch64 / btrfs defaults,compress=zstd,commit=120 0 0
System-wide settings also affect commit intervals. They include the files under and are out-of-scope of this wiki article. The kernel documentation for them is available at https://docs.kernel.org/admin-guide/sysctl/vm.html.
SSD TRIM
A Btrfs filesystem is able to free unused blocks from an SSD drive supporting the TRIM command. Starting with kernel version 5.6, there is asynchronous discard support, enabled with mount option . Freed extents are not discarded immediately, but grouped together and trimmed later by a separate worker thread, improving commit latency.
More information about enabling and using TRIM can be found in Solid State Drives#TRIM.
Usage
Swap file
Swap files in Btrfs are supported since Linux kernel 5.0. The proper way to initialize a swap file is to first create a non-snapshotted subvolume to host the file and then set the attribute on the whole directory with chattr.
# chattr +C /path/to/swapsubvolume
From now on, any new file created inside the swap subvolume will have the attribute set.
Continue with the steps in Swap file#Swap file creation. Configuring hibernation to a swap file is described in Power management/Suspend and hibernate#Hibernation into swap file on Btrfs.
Displaying used/free space
General linux userspace tools such as will inaccurately report free space on a Btrfs partition. It is recommended to use to query Btrfs partitions. For example, for a full breakdown of device allocation and usage stats:
# btrfs filesystem usage /
Alternatively, allows a quick check on usage of allocated space without the requirement to run as root:
$ btrfs filesystem df /
The same limitations apply to tools which analyze space usage for some subset of the filesystem, such as or , as they do not take into account reflinks, snapshots and compression. Instead, see and compsize for btrfs-aware alternatives.
Defragmentation
autodefrag
on these versions, one should even use noautodefrag
to make sure online defragmentation is disabled. See and .Btrfs supports online defragmentation through the mount option ; see . To manually defragment your root, use:
# btrfs filesystem defragment -r /
Using the above command without the -r
switch will result in only the metadata held by the subvolume containing the directory being defragmented. This allows for single file defragmentation by simply specifying the path.
RAID
Btrfs offers native "RAID" for #Multi-device file systems. Notable features which set btrfs RAID apart from mdadm are self-healing redundant arrays and online balancing. See the Btrfs wiki page for more information. The Btrfs sysadmin page also has a section with some more technical background.
Scrub
The Btrfs Wiki Glossary says that Btrfs scrub is "[a]n online filesystem checking tool. Reads all the data and metadata on the filesystem and uses checksums and the duplicate copies from RAID storage to identify and repair any corrupt data."
Start manually
To start a (background) scrub on the filesystem which contains :
# btrfs scrub start /
To check the status of a running scrub:
# btrfs scrub status /
Start with a service or timer
The btrfs-progs package brings the unit for monthly scrubbing the specified mountpoint. Enable the timer with an escaped path, e.g. for and for . You can use systemd-escape -p /path/to/mountpoint
to escape the path; see for details.
You can also run the scrub by starting (with the same encoded path). The advantage of this over (as the root user) is that the results of the scrub will be logged in the systemd journal.
On large NVMe drives with insufficient cooling (e.g. in a laptop), scrubbing can read the drive fast enough and long enough to get it very hot. If you are running scrubs with systemd, you can easily limit the rate of scrubbing with the IOReadBandwidthMax
option described in by using a drop-in file.
Balance
"A balance passes all data in the filesystem through the allocator again. It is primarily intended to rebalance the data in the filesystem across the devices when a device is added or removed. A balance will regenerate missing copies for the redundant RAID levels, if a device has failed." See Upstream FAQ page.
On a single-device filesystem, a balance may be also useful for (temporarily) reducing the amount of allocated but unused (meta)data chunks. Sometimes this is needed for fixing "filesystem full" issues.
# btrfs balance start --bg / # btrfs balance status /
Snapshots
"A snapshot is simply a subvolume that shares its data (and metadata) with some other subvolume, using btrfs's COW capabilities." See Btrfs Wiki SysadminGuide#Snapshots for details.
To create a snapshot:
# btrfs subvolume snapshot source [dest/]name
To create a readonly snapshot, add the -r
flag. To create writable version of a readonly snapshot, simply create a snapshot of it.
Send/receive
A subvolume can be sent to stdout or a file using the command. This is usually most useful when piped to a Btrfs command. For example, to send a snapshot named (perhaps of a snapshot you made of earlier) to /backup
, you would do the following:
# btrfs send /root_backup | btrfs receive /backup
The snapshot that is sent must be readonly. The above command is useful for copying a subvolume to an external device (e.g. a USB disk mounted at /backup
above).
You can also send only the difference between two snapshots. For example, if you have already sent a copy of above and have made a new readonly snapshot on your system named , then to send only the incremental difference to /backup
, do:
# btrfs send -p /root_backup /root_backup_new | btrfs receive /backup
Now, a new subvolume named will be present in /backup
.
See Btrfs Wiki's Incremental Backup page and #Incremental backup to external drive on how to use this for incremental backups and for tools that automate the process.
Deduplication
Using copy-on-write, Btrfs is able to copy files or whole subvolumes without actually copying the data. However, whenever a file is altered, a new proper copy is created. Deduplication takes this a step further by actively identifying blocks of data which share common sequences and combining them into an extent with the same copy-on-write semantics.
Tools dedicated to deduplicate a Btrfs formatted partition include duperemove, , and btrfs-dedup. One may also want to merely deduplicate data on a file based level instead using e.g. , or . For an overview of available features of those programs and additional information, have a look at the upstream Wiki entry.
Furthermore, Btrfs developers are working on inband (also known as synchronous or inline) deduplication, meaning deduplication done when writing new data to the filesystem. Currently, it is still an experiment which is developed out-of-tree. Users willing to test the new feature should read the appropriate kernel wiki page.
Resizing
You can grow a file system to the maximum space available on the device, or specify an exact size. Ensure that you grow the size of the device or logical volume before you attempt to increase the size of the file system. When specifying an exact size for the file system on a device, either increasing or decreasing, ensure that the new size satisfies the following conditions:
- The new size must be greater than the size of the existing data; otherwise, data loss occurs.
- The new size must be equal to or less than the current device size because the file system size cannot extend beyond the space available.
To extend the file system size to the maximum available size of the device:
# btrfs filesystem resize max /
To extend the file system to a specific size:
# btrfs filesystem resize size /
Replace with the desired size in bytes. You can also specify units on the value, such as K (kibibytes), M (mebibytes), or G (gibibytes). Alternatively, you can specify an increase or decrease to the current size by prefixing the value with a plus (+) or a minus (-) sign, respectively:
# btrfs filesystem resize +size / # btrfs filesystem resize -size /
Known issues
A few limitations should be known before trying.
Encryption
Btrfs has no built-in encryption support, but this may come in the future. Users can encrypt the partition before running . See dm-crypt/Encrypting an entire system#Btrfs subvolumes with swap.
Existing Btrfs file systems can use something like EncFS or TrueCrypt, though perhaps without some of Btrfs' features.
btrfs check issues
The tool has known issues and should not be run without further reading; see section #btrfs check.
Tips and tricks
Partitionless Btrfs disk
Btrfs can occupy an entire data storage device, replacing the MBR or GPT partitioning schemes, using subvolumes to simulate partitions. However, using a partitionless setup is not required to simply create a Btrfs filesystem on an existing partition that was created using another method. There are some limitations to partitionless single disk setups:
- Cannot place other file systems on another partition on the same disk.
- Due to the previous point, having an ESP on this disk is not possible. Another device is necessary for UEFI boot.
- If using a Linux kernel version before 5.0, you cannot use swap area as Btrfs did not support swap files pre-5.0 and there is no place to create swap partition.
To overwrite the existing partition table with Btrfs, run the following command:
# mkfs.btrfs /dev/sdX
For example, use rather than /dev/sda1
. The latter would format an existing partition instead of replacing the entire partitioning scheme. Because the root partition is Btrfs, make sure is compiled into the kernel, or put into mkinitcpio.conf#MODULES and regenerate the initramfs.
Install the boot loader like you would for a data storage device with a Master Boot Record. See Syslinux#Manual install or GRUB/Tips and tricks#Install to partition or partitionless disk. If your kernel does not boot due to , please add in and generate the grub configuration.
Ext3/4 to Btrfs conversion
Boot from an install CD, then convert by doing:
# btrfs-convert /dev/partition
Mount the partion and test the conversion by checking the files. Be sure to change the to reflect the change (type to and fs_passno [the last field] to as Btrfs does not do a file system check on boot). Also note that the UUID of the partition will have changed, so update fstab accordingly when using UUIDs. into the system and rebuild your bootloaders menu list (see Install from existing Linux). If converting a root filesystem, while still chrooted, run to regenerate the initramfs or the system will not successfully boot.
After confirming that there are no problems, complete the conversion by deleting the backup sub-volume. Note that you cannot revert back to ext3/4 without it.
# btrfs subvolume delete /ext2_saved
Finally, balance the file system to reclaim the space.
Remember that some applications which were installed prior have to be adapted to Btrfs.
Checksum hardware acceleration
CRC32 is a new instruction in Intel SSE4.2. To verify if Btrfs checksum is hardware accelerated:
If you see , it is probably because your root partition is Btrfs, and you will have to compile crc32c-intel
into the kernel to make it work. Putting crc32c-intel
into mkinitcpio.conf does not work.
Corruption recovery
btrfs-check cannot be used on a mounted file system. To be able to use btrfs-check without booting from a live USB, add it to the initial ramdisk:
Then if there is a problem booting, the utility is available for repair.
See for more information.
Booting into snapshots
In order to boot into a snapshot, the same procedure applies as for mounting a subvolume as your root partition, as given in section mounting a subvolume as your root partition, because snapshots can be mounted like subvolumes.
Use Btrfs subvolumes with systemd-nspawn
See the Systemd-nspawn#Use Btrfs subvolume as container root and Systemd-nspawn#Use temporary Btrfs snapshot of container articles.
Reducing access time metadata updates
Because of the copy-on-write nature of Btrfs, simply accessing files can trigger the metadata copy and writing. Reducing the frequency of access time updates may eliminate this unexpected disk usage and increase performance. See fstab#atime options for the available options.
Incremental backup to external drive
The following packages use and to send backups incrementally to an external drive. Refer to their documentation to see differences in implementation, features, and requirements.
- btrbk — Tool for creating snapshots and remote backups of Btrfs subvolumes.
The following package allows backing up snapper snapshots to non-Btrfs file systems.
Troubleshooting
See the Btrfs Problem FAQ for general troubleshooting.
Partition offset
The offset problem may happen when you try to embed into a partitioned disk. It means that it is OK to embed GRUB's into a Btrfs pool on a partitionless disk (e.g. ) directly.
GRUB can boot Btrfs partitions, however the module may be larger than other file systems. And the file made by grub-install
may not fit in the first 63 sectors (31.5KiB) of the drive between the MBR and the first partition. Up-to-date partitioning tools such as and avoid this issue by offsetting the first partition by roughly 1MiB or 2MiB.
Missing root
Users experiencing the following: error no such device: root
when booting from a RAID style setup then edit /usr/share/grub/grub-mkconfig_lib and remove both quotes from the line . Regenerate the config for grub and the system should boot without an error.
Mounting timed out
Sometimes, especially with large RAID1 arrays, mounting might time out during boot with a journal message such as:
This can easily be worked around by providing a longer timeout via the systemd-specific mount option in fstab. For example:
/dev/sda /storage btrfs rw,relatime,x-systemd.mount-timeout=5min 0 0
BTRFS: open_ctree failed
As of November 2014, there seems to be a bug in systemd or mkinitcpio causing the following error on systems with multi-device Btrfs filesystem using the hook in :
A workaround is to remove from the array in /etc/mkinitcpio.conf
and instead add to the array. Then regenerate the initramfs and reboot.
You will get the same error if you try to mount a raid array without one of the devices. In that case, you must add the mount option to . If your root resides on the array, you must also add to your kernel parameters.
As of August 2016, a potential workaround for this bug is to mount the array by a single drive only in , and allow btrfs to discover and append the other drives automatically. Group-based identifiers such as UUID and LABEL appear to contribute to the failure. For example, a two-device RAID1 array consisting of 'disk1' and disk2' will have a UUID allocated to it, but instead of using the UUID, use only in . For a more detailed explanation, see the following blog post.
Another possible workaround is to remove the hook in mkinitcpio.conf and replace it with the systemd
hook. In this case, should not be in the or arrays.
See the original forums thread and for further information and discussion.
btrfs check
The command can be used to check or repair an unmounted Btrfs filesystem. However, this repair tool is still immature and not able to repair certain filesystem errors even those that do not render the filesystem unmountable.
See also
- Official site
- Performance related
- Miscellaneous
- Funtoo:BTRFS Fun
- Avi Miller presenting Btrfs at SCALE 10x, January 2012.
- Summary of Chris Mason's talk from LFCS 2012
- Btrfs: stop providing a bmap operation to avoid swapfile corruptions 2009-01-21
- Doing Fast Incremental Backups With Btrfs Send and Receive