25

I am making zpools on FreeBSD machine. While creating zpools I observe these two cases:

  1. If I take raw disks and create zpool then I am able to form zpools and they are working perfectly.

  2. If I format disks using gpart in freebsd-zfs format and then make zpool then also they are working perfectly.

What I am confused in, is which approach is better for creation of zpools?

chutz
  • 7,569
  • 1
  • 28
  • 57
shivams
  • 439
  • 1
  • 7
  • 15
  • [This forum thread](https://forums.freebsd.org/threads/zfs-whole-disk-vs-gpt-slice.62855/) has also some interesting and relevant discussion. – Albert Jan 11 '21 at 11:54

4 Answers4

14

Use one slice/partition dedicated for ZFS per physical disc and leave some space left unpartitioned. That way if you ever need to replace a drive and the replacement is 10 sectors smaller, you'll still be able to do it (http://www.freebsddiary.org/zfs-with-gpart.php).

That's what Solaris automatically does, that's what FreeNAS does (https://forums.freenas.org/index.php?threads/zfs-on-partitioned-disks.37079/) and thats ZoL does when you give it a whole disk - it will partition it...

The overhead to translate the position on the partition to the position on the actual device is negligible. So once the partition is correctly aligned to the physical sector boundary, there is no reason for it to behave differently than whole block device.

With ZoL, the only difference I am aware are is that ZoL will switch the disk scheduler to noop when whole disk was given to the vdev. Nothing stopping you from setting it manually.

There are some don't though... Don't create more partitions for ZFS per disk and if you decide to ignore the advice above, never ever build vdevs from them in same zpool. This will basically kill the performance, as ZFS will slice the data between the vdevs and sequential iops will turn into seek-nightmare...

Grogi
  • 289
  • 2
  • 5
  • 1
    Can you put some more light on zfs behaviour about seek-nightmare with disk slices being shared/used? – sherpaurgen Aug 16 '18 at 16:04
  • Simplest scenario - you build a pool from two vdevs, each a partition on the same harddisk. Now, you want to write a data chunk big enough that spans across both vdevs. Despite sequentially writing data, the drive has to seek between two separate locations to save it. – Grogi Aug 17 '18 at 17:32
  • Is this behaviour same if Pool_A uses(sda1,sdb1,sdc3) and Pool_B uses (sda2,sdb2,sdc3) – sherpaurgen Aug 19 '18 at 14:51
  • You don't have to write to them simultaneously... If you have the vdevs in one pool, there is nothing you can do... – Grogi Aug 19 '18 at 19:12
  • Regarding _With ZoL, the only difference I am aware are is that ZoL will switch the disk scheduler to noop when whole disk was given to the vdev. Nothing stopping you from setting it manually._, may I ask if it is still true in 2022? How to check this? And how to set it manually? Thanks a lot. – midnite Jun 20 '22 at 04:43
13

It's better to use whole-disks with ZFS, when possible.
There's no need to partition in your use case.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • 3
    What do you mean by `your use case`. Can you give some case where partitioning might be useful? – shivams Sep 15 '14 at 15:54
  • Partitioning is not useful in ZFS unless you have an esoteric configuration or are using certain SSD solutions or doing something odd with [ZIL and L2ARC devices](http://serverfault.com/questions/238675/zfs-how-to-partition-ssd-for-zil-or-l2arc-use). And even then, it's best to just use whole-devices/disks. – ewwhite Sep 15 '14 at 15:56
  • 5
    But why is it better to use whole-disks? – leetNightshade Dec 18 '14 at 03:16
  • 1
    @leetNightshade Because there is no much point partitioning disks when you use ZFS with which file systems do not need their own partition(s). Moreover, when ZFS "own" the disk, it can enable the disk write cache to get better performance. See http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Storage_Pools – jlliagre Apr 20 '15 at 09:47
  • 1
    @leetNightshade It is not better. – Grogi Nov 10 '16 at 11:17
  • 2
    @Grogi Any good evidence to back that up? – leetNightshade Nov 10 '16 at 19:28
  • 6
    When I set up ZFS for the first time, reading this answer gave me a misleading impression about the cost/risk of partitioning, so I decided to use whole disks. In fact my "esoteric" configuration (mirroring drives of different sizes) would have benefited from the use of partitions, and it seems from the other answers that there is no cost – "using whole disks" just means that ZFS automatically partitions them for you. – sjy Nov 16 '19 at 00:42
  • @sjy I'm pretty sure most people would advise using disks of equal sizes in a RAID array or a mirror or in a specific VDEV. – ewwhite Nov 16 '19 at 08:01
2

People here claim you need to use whole disks to get disk write cache advantages, but that's not true. What is true is that ZFS will try to enable the disk write cache for you if you give it the whole disk. However, even if you just give it a single partition, nobody stops you from enabling the disk write cache yourself, in which case ZFS will detect that and honor it exactly the same was as if it had enabled the cache itself. So it's not a requirement to use whole disks getting write cache advantages, it only safes you the work to enable disk write cache yourself.

People here claim that disk access is faster if you give it the whole disk. While this statement is technically true, the speed difference is so tiny that it is entirely neglectable. If your ZFS setup is too slow, it will still be too slow if you only use entire disks. And if it is doing well with entire disk, it would do equally well with just single partitions.

One reason for not using the whole disk is that disk sizes do vary. Imagine the following case: You create a RAIDZ with 3 identical disks, each of them is 4 TB and you always use the full disk. After 3 years, one of the disk fails. No problem, you just buy a replacement but meanwhile that disk has a new hardware revision and the new 4 TB model is smaller by just 256 KB. It won't work! You cannot use it as replacement because it must have exactly the same size as your other two disks or be even bigger. If it is smaller, no matter by how little bytes, it won't work.

That's why I always create a partition and leave maybe 10 MB of the disk unused. Losing 10 MB per disk is close to nothing (0.00025% in case of 4 TB disks) and it means that a replacement disk could be up to 10 MB smaller and it would still work (or have up to 10 MB of bad sectors, etc.)

Mecki
  • 799
  • 1
  • 6
  • 16
1

in my head this question arises because of doubt if i late can designate, what is on that disk... so when you create pool on whole drive (yes, with -f option, if needed), zpool practically create gpt partition table and partitions of Solaris, like this:

(fdisk -l ...)
...
Disklabel type: gpt
Disk identifier: 4CBE587E-23AF-8E4B-A7F0-B44AD6083171

Device          Start        End    Sectors  Size Type
/dev/sdd1        2048 3907010559 3907008512  1,8T Solaris /usr & Apple ZFS
/dev/sdd9  3907010560 3907026943      16384    8M Solaris reserved 1

so there really is no need to create partitions manually...

  • I do not consider what solaris does best practice on freebsd. For example in case of mirrors you can replace disks with a bigger ones and expand the pool size on the fly with one command or even automatically if configured as such. – cstamas Nov 25 '18 at 00:56