2

I have a cluster of 4 servers. One of the namespaces is raw device based. The devices reside on a SAS mechanical hard drive.

Now here is the weird part of the story. I am running one of the tests with small records (2x50 bytes = 100 bytes total). I get to write at between 150 - 200k OPS. Now when it comes to reading - the throughput drops to 4k OPS!!! Yes, I know - this is might weird, and I am totally confused.

The servers show very little load during the read. The iotop and nload show nothing I can identify as a problem.

Here is the device config:

namespace test-raw {
        replication-factor 4
        memory-size 16G
        default-ttl 7200
        max-ttl 2D
        high-water-disk-pct 80
        high-water-memory-pct 60
        stop-writes-pct 90
        partition-tree-locks 64
        partition-tree-sprigs 4096

        storage-engine device {
                device /dev/sdb1
                write-block-size 1M
                max-write-cache 8G
                data-in-memory false
                cold-start-empty true
        } 
}

Any insight would be much appreciated.

Cheers,

Boris.

  • 2
    New writes (as opposed to updates) will not incur disk i/o on the transaction path, contrary to reads. –  Jan 31 '18 at 18:24
  • 2
    To clarify, by 'transaction path' I meant is from the client's latency's perspective. Ronen has provided nice links with all the details. Bottom line, it is misleading to compare latency between new writes vs. reads. –  Jan 31 '18 at 19:25

2 Answers2

3

You should not use a HDD as your main storage device along with Aerospike, as you'll be missing out on all the low-level optimizations targeting SSDs. HDDs are not built to handle a large number of concurrent reads, where as this is one of the main advantages of SSDs. The only place a HDD is appropriate in Aerospike is as a persistence layer for an in-memory namespace. Your namespace stores its data on device, that device should be a decent enterprise grade (AKA DC quality) SSD.

See Comparing SSD performance based on "config recipe" and the following from the Frequently Asked Questions (FAQ):

Can I store data on hard disk rather than SSD?

No. The Aerospike database is intended to be a high performance, low-latency database. Because of this, the physical limitations of rotational disks add an unacceptable amount of latency to the data.

Now for some quick fixes:

Ronen Botzer
  • 139
  • 3
  • 1
    Cross posted on the Aerospike discussion forum: https://discuss.aerospike.com/t/very-odd-raw-device-behaviour/4902 – Ronen Botzer Jan 31 '18 at 22:31
1

...since your records are only 100 bytes, you are probably using 256 bytes per record (with overhead & 128 byte boundary). If write-block-size, default is 1 MB, you are fitting about 4K records in 1 MB in RAM while writing, which is asynchronously flushed to disk as a 1 MB block. On read, you reading individual record from disk in 128 byte read chunks. If you are reading a recently updated record, you are probably getting it from post write queue in RAM otherwise you are accessing the disk. So your read delay is coming from slow performance of the disk for records that have to be fetched from the disk. If the write-block-size was 128K, then you would fit about 500 records per block. You can play with write-block-size on a test cluster and see if the performance tracks. Check write-q value in the /var/log/aerospike/aerospike.log to see if the disk is slow. If the disk is not the bottleneck, write-q will be zero under write throughput. You have a very large max-write-cache - 8G - (64M is default) which is also helping you with the writes. You can also test with reducing post-write-queue to a very small number and see if read throughput gets worse.

pgupta
  • 111
  • 1