13

Spin off from these previously asked questions

How to get free space from mounted drive Redhat 7

Update crypttab asks for Passphrase for fstrim

We have a HP 3PAR StoreServ 7400 with 170 VM's speread out across 38 hosts.

Here is the problem as I understand it: (Also I have been told some information that im not sure if it's true or not, I have read over the HP 3PAR StoreServ 7400 whitepaper and really cannot find anything that backs up what my storage guy is telling me. So throughout the below if anyone notices anything not true please let me know.)

The 3 PAR is broken up into 3 sections,

Layer 1: SSD used to cache and quick access of commonly accessed files.

Layer 2: and Layer 3: Some kind of spinning disc, what and why there are additional 2 layers im unsure of but my assumption is Layer 2 is used for data that is not most commonly accessed but access a bit and Layer 3 is used for storage of the rest.

Within the SSD portion as I have read in many articles when data is written to a SSD block and then deleted that block is not zeroed until new data is written to it, So when the data within the block is deleted the table that stores the mapping info gets updated, then when new data is written to that same block the block first needs to be zeroed and then it can be written to. This process within SSD if the drive is not trimed periodicity can lead to lower w/r speeds.

The 3PAR LUN is thin provisioned the VM's are Eager Thick provisioned.

According to my storage guy the 3PAR has a special feature built in that allows SSD storage not being used to be available to the other VM's as needed which makes no sense.

Fact Check:

A thick provisioned VM is a VMDK files, when the VM is created you specify the size of the VM and this creates a VMDK file. In my mind that tells me that if the VM is being accessed regularly the entire VMDK file is then moved to SDD, and what they are telling me is that even if the VMDK is set to use 40GB that some of that 40GB can be used on other VM's? That sounds more to me like a thin provisioned VM not a thick.

Ok getting to the problem.

On our windows systems we use sdelete to find and zero out unused blocks.

On our Linux Fedora system I have been all over trying to figure out how to get fstrim to work.

I did try the dd=write-big-file delete-big-file command and that sent the disk I/O through the roof, which was noticed, and I was told not to do that again.

Doing a little research it looks to me that sdelete pretty much does the same thing as dd=write-big-file delete-big-file so why does the disk I/O not go through the roof on windows systems?

So i think i have whittled it down to two solutions. Neither of which I know how to do.

  1. Somehow without v-motioning the VMs to a different storage array be able to run a fstrim like function on the entire SSD portion of the SAN.

Side note: If i understand everything I have read fstrim looks at every block to see if data is there and if it is needed, if not needed will zero out the block, where as sdelete writes a huge file and then deletes it. Which is why I am looking for a fstrim option across the entire SSD portion of the 3PAR.

  1. Longshot but the error i get with fstrim is:

[root@rhtest ~]# fstrim -v / fstrim: /: the discard operation is not supported

I have read that the discard option needs to be set on both the OS and the datastore but i cannot figure out where or how to set a discard option on the 3PAR i have both SSH and GUI access to the 3PAR.

I have been through countless walkthroughs on setting up discards within the OS and not matter how many different ways I spin it I always get the same error.

Yes i have also looked into other options zerofree was one, and a couple others that do not come to mind however they either worked like zdelete, or i read that they were very dangerous, I looked into the hdparam etc.

Below I will put some output about the OS in question they are all the same.

[root@rhtest ~]# hostnamectl
    Static hostname: rhtest.domain.com
    Icon name: computer-vm
    Chassis: vm
    Machine ID: f52e8e75ae704c579e2fbdf8e7a1d5ac
    Boot ID: 98ba6a02443d41cba9cf457acf5ed194
    Virtualization: vmware
    Operating System: Red Hat Enterprise Linux Server 7.2 (Maipo)
    CPE OS Name: cpe:/o:redhat:enterprise_linux:7.2:GA:server
    Kernel: Linux 3.10.0-327.el7.x86_64
    Architecture: x86-64

[root@rhtest ~]# blkid
    /dev/block/8:2: UUID="2OHGU8-ir1w-LLGB-6v72-zZqN-CIaX-FjGImJ" TYPE="LVM2_member"
    /dev/block/253:1: UUID="ad872f09-5147-4252-af56-aa6244219515" TYPE="xfs"
    /dev/block/8:1: UUID="83aac355-a443-4ff9-90fa-9f6da8e31cc2" TYPE="xfs"
    /dev/block/253:0: UUID="dbe56f6a-2a4a-42da-82e2-bef9a73caafb" TYPE="swap"

[root@rhtest ~]# lsblk
    NAME                           MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    fd0                              2:0    1    4K  0 disk
    sda                              8:0    0   50G  0 disk
    ââsda1                           8:1    0  500M  0 part /boot
    ââsda2                           8:2    0 49.5G  0 part
        âârhel_-rhtest-swap 253:0    0    2G  0 lvm  [SWAP]
        âârhel_-rhtest-root 253:1    0 47.5G  0 lvm  /
    sdb                              8:16   0   50G  0 disk
    sr0                             11:0    1 1024M  0 rom


[root@rhtest ~]# df -h
    Filesystem                              Size  Used Avail Use% Mounted on
    /dev/mapper/rhel_-rhtest-root   48G  883M   47G   2% /
    devtmpfs                                991M     0  991M   0% /dev
    tmpfs                                  1001M     0 1001M   0% /dev/shm
    tmpfs                                  1001M  8.5M  993M   1% /run
    tmpfs                                  1001M     0 1001M   0% /sys/fs/cgroup
    /dev/sda1                               497M  124M  374M  25% /boot
    tmpfs                                   201M     0  201M   0% /run/user/0
Anthony Fornito
  • 9,526
  • 1
  • 33
  • 122

2 Answers2

10

Being able to run fstrim on the / partitions would be the best solution however with they way your ESXi is configured it would not be possible.

You need to be able to enable discards on both the VM and the storage device.

Trying to reduce to size of a partition or logical volume with the xfs filesystem cannot be done this is a known bug with fedora. If you are interested in this functionality please contact Red Hat support and reference Red Hat bugzilla 1062667, and provide your use-case for needing XFS reduction / shrinking.

As a possible work around in some environments, thin provisioned LVM volumes can be considered as an additional layer below the XFS file system.

If the VM's are eager thick provisioned VMDK, which means that there is nothing to reclaim when you are attempting to trim (technically speaking; SCSI UNMAP) your volumes.

If the back-end storage is running thin provisioning then you also need to use lazy zeroed VMDK files in order to reduce the storage and make it possible for the backend to cache/dedup the warm data.

Two possible options:

1. When storage is provided by a remote server across a SAN, you can only discard blocks if the storage is thin provisioned.

    1. VMotion all the VM's to a different data store and use the built-in VMWare tools
    2. Connect to the ESXi Host with SSH
    3. Navigate to the Virtual Machine Folder
    4. Verify disk usage with du
    5. Run vmkfstools -K [disk]
    6. Verify disk usage with du

2.  dd if=/dev/zero of=BIGFILE bs=1024000
    rm -f BIGFILE

From what I can tell this does the same thing as sdelete however it can cause a spike in disk I/O as well as take a while to run.

Something to try overnight

Either option is not the best but reformatting every VM to get ext3 or ext4 does not sound feasible.

What you might be able to do is setup an affinity rule for all linux VM’s and use option 1 from above.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
Brian Curless
  • 709
  • 3
  • 12
3

You are using eager thick provisioned VMDK, which means that there is nothing to reclaim when you are attempting to trim (technically speaking; SCSI UNMAP) your volumes.

If the back-end storage is running thin provisioning then you also need to use lazy zeroed VMDK files in order to reduce the storage and make it possible for the backend to cache/dedup the warm data.

pauska
  • 19,532
  • 4
  • 55
  • 75
  • Thank you for answering, however Im not sure i completely understand your answer, if all my assumptions from the question are correct there would be a need to reclaim the non-zero blocks from the SAN especially if the VMDK file is moved out of SSD to spinning disk.? Correct? – Anthony Fornito Oct 31 '16 at 19:34
  • 3
    @AnthonyFornito You cannot reclaim anything at all with eager thick disks. Eager thick means that VMWare forces the backend storage to write the full allocation of each file, including zeroes. – pauska Nov 01 '16 at 00:10
  • @pauska is totally correct. 3PAR and a lot of similar solutions are designed for compression/deduplication/tiering. Your hybrid 3PAR model is more about capacity efficiency and not really the performance-oriented configuration. That's why it is better to use lazy zeroed disks instead of eager zeroed ones in your case. – Strepsils Nov 09 '16 at 18:43