4

It is well known that erasure coding adds extra complexity because of encoding and decoding operations. Due to this drawback, most of cloud services recommend to use data replication for hot data and erasure coding for cold data.

For example, from the Ceph documentation:

The erasure-coded pool crush ruleset targets hardware designed for cold storage with high latency and slow access time. The replicated pool crush ruleset targets faster hardware to provide better response times.

Is there a better definition of hot data than "data that are more accessed than others" ?

Let consider a storage system that relies on erasure coding and an application that runs on it, defined by an intensive I/O workload. Is it considered as hot data ?

Now, how can I say if my storage system' erasure code is viable or not ? Is it relevant to measure IOPS from the application side for some specific tests (i.e. random/sequential read/write) ?

Is there a threshold that says erasure codes are not viable for hot data because I only record (for example) one hundred of IOPS application-side for random write operation of 4 kB blocks ? What if I record one hundred of billion IOPS ?

Are IOPS relevant for this kind of test (maybe an other metric would say more) ?

I am full of questions about it and any help would be grateful.

denaitre
  • 88
  • 5

1 Answers1

0

To make erasure viable for hot data, you have to think about an erasure coding that works fine on a data block size that matches the one used by the filesystem (typically 4K). However this is not enough, we have also to think about the architecture of your file system and in particular the potential impact it might have on metadata (typically where are the servers where each blocks have been stored, etc...).

So in order to build a filesystem that uses erasure coding we rather have to think building the filesystem around the erasure coding rather than just adding the erasure coding in an existing filesystem.

One common drawback about the erasure coding was about the CPU time and most of the implementation based on Reed-Solomon did it on large block size to compensate the throughput issue. For that reason, most of them target it from archiving only.

However, there are some alternative to make it work on small data block (4K). In our scale-out NAS product (RozoFS) we use an other erasure coding algorithm (geometric versus algebric for the case of the Reed solomon) that has the advantage to provide fast encoding/decoding on small data block size (more than 10 GBytes/s on a intel I7 @2GHz).

The encoding/decoding speed associated to the fact that it operates on block size in the range of the filesystem permits to avoid the penalty on extra reads on write random requests. And by the way we can address the case of the live data and in particular the case of the random I/O on small data block size.

In your are interesting about the details in terms of performance we have a web site were we posted the benchmark we did we Iozone (sequential and random access tests). (rozofs.com - blog section)).