I'm in charge of downloading and processing large amounts of financial data. Each trading day, we have to add around 100GB.
To handle this amount of data, we rent a virtual server (3 cores, 12 GB ram) and a 30 TB block device from our university's data center.
On the virtual machine I installed Ubuntu 16.04 and ZFS on Linux. Then, I created a ZFS pool on the 30TB block device. The main reason for using ZFS is the compression feature as the data is nicely compressible (~10%). Please don't be too hard on me for not following the golden rule that ZFS wants to see bare metal, I am forced to use the infrastructure as it is.
The reason for posting is that I face a problem of poor write speeds. The server is able to read data with about 50 MB/s from the block device but writing data is painfully slow with about 2-4 MB/s.
Here is some information on the pool and the dataset:
zdb
tank:
version: 5000
name: 'tank'
state: 0
txg: 872307
pool_guid: 8319810251081423408
errata: 0
hostname: 'TAQ-Server'
vdev_children: 1
vdev_tree:
type: 'root'
id: 0
guid: 8319810251081423408
children[0]:
type: 'disk'
id: 0
guid: 13934768780705769781
path: '/dev/disk/by-id/scsi-3600140519581e55ec004cbb80c32784d-part1'
phys_path: '/iscsi/disk@0000iqn.2015-02.de.uni-konstanz.bigdisk%3Asn.606f4c46fd740001,0:a'
whole_disk: 1
metaslab_array: 30
metaslab_shift: 38
ashift: 9
asize: 34909494181888
is_log: 0
DTL: 126
create_txg: 4
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
zpool get all
NAME PROPERTY VALUE SOURCE
tank size 31,8T -
tank capacity 33% -
tank altroot - default
tank health ONLINE -
tank guid 8319810251081423408 default
tank version - default
tank bootfs - default
tank delegation on default
tank autoreplace off default
tank cachefile - default
tank failmode wait default
tank listsnapshots off default
tank autoexpand off default
tank dedupditto 0 default
tank dedupratio 1.00x -
tank free 21,1T -
tank allocated 10,6T -
tank readonly off -
tank ashift 0 default
tank comment - default
tank expandsize 255G -
tank freeing 0 default
tank fragmentation 12% -
tank leaked 0 default
tank feature@async_destroy enabled local
tank feature@empty_bpobj active local
tank feature@lz4_compress active local
tank feature@spacemap_histogram active local
tank feature@enabled_txg active local
tank feature@hole_birth active local
tank feature@extensible_dataset enabled local
tank feature@embedded_data active local
tank feature@bookmarks enabled local
tank feature@filesystem_limits enabled local
tank feature@large_blocks enabled local
zfs get all tank/test
NAME PROPERTY VALUE SOURCE
tank/test type filesystem -
tank/test creation Do Jul 21 10:04 2016 -
tank/test used 19K -
tank/test available 17,0T -
tank/test referenced 19K -
tank/test compressratio 1.00x -
tank/test mounted yes -
tank/test quota none default
tank/test reservation none default
tank/test recordsize 128K default
tank/test mountpoint /tank/test inherited from tank
tank/test sharenfs off default
tank/test checksum on default
tank/test compression off default
tank/test atime off local
tank/test devices on default
tank/test exec on default
tank/test setuid on default
tank/test readonly off default
tank/test zoned off default
tank/test snapdir hidden default
tank/test aclinherit restricted default
tank/test canmount on default
tank/test xattr on default
tank/test copies 1 default
tank/test version 5 -
tank/test utf8only off -
tank/test normalization none -
tank/test casesensitivity mixed -
tank/test vscan off default
tank/test nbmand off default
tank/test sharesmb off default
tank/test refquota none default
tank/test refreservation none default
tank/test primarycache all default
tank/test secondarycache all default
tank/test usedbysnapshots 0 -
tank/test usedbydataset 19K -
tank/test usedbychildren 0 -
tank/test usedbyrefreservation 0 -
tank/test logbias latency default
tank/test dedup off default
tank/test mlslabel none default
tank/test sync disabled local
tank/test refcompressratio 1.00x -
tank/test written 19K -
tank/test logicalused 9,50K -
tank/test logicalreferenced 9,50K -
tank/test filesystem_limit none default
tank/test snapshot_limit none default
tank/test filesystem_count none default
tank/test snapshot_count none default
tank/test snapdev hidden default
tank/test acltype off default
tank/test context none default
tank/test fscontext none default
tank/test defcontext none default
tank/test rootcontext none default
tank/test relatime off default
tank/test redundant_metadata all default
tank/test overlay off default
tank/test com.sun:auto-snapshot true inherited from tank
Can you give me a hint what I could do to improve the write speeds?
Update 1
After your comments about the storage system I went to the IT department. The guy told me that the logical block which the vdev exports is actually 512 B.
This is the output of dmesg
:
[ 8.948835] sd 3:0:0:0: [sdb] 68717412272 512-byte logical blocks: (35.2 TB/32.0 TiB)
[ 8.948839] sd 3:0:0:0: [sdb] 4096-byte physical blocks
[ 8.950145] sd 3:0:0:0: [sdb] Write Protect is off
[ 8.950149] sd 3:0:0:0: [sdb] Mode Sense: 43 00 10 08
[ 8.950731] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 8.985168] sdb: sdb1 sdb9
[ 8.987957] sd 3:0:0:0: [sdb] Attached SCSI disk
So 512 B logical blocks but 4096 B physical block?!
They provide me some temporary file system to which I can backup the data. Then, I will first test the speed on the raw device before setting up the pool from scratch. I will send an update.
Update 2
I destroyed the original pool.
Then I ran some speed tests using dd
, the results are ok - around 80MB/s in both directions.
As a further check I created an ext4 partition on the device. I copied a large zip file to this ext4 partition and the average write speed is around 40MB/s. Not great, but enough for my purposes.
I continued by creating a new storage pool with the following commands
zpool create -o ashift=12 tank /dev/disk/by-id/scsi-3600140519581e55ec004cbb80c32784d
zfs set compression=on tank
zfs set atime=off tank
zfs create tank/test
Then, I again copied a zip file to the newly create test
file system.
The write speed is poor, just around 2-5 MB/s.
Any ideas?
Update 3
tgx_sync
is blocked when I copy the files. I opened a ticket on the github repository of ZoL.