Any benefits of ZFS over EXT4 for data stream processing on top of HDFS?

Asked Oct 08 '19 at 15:36

Active Oct 08 '19 at 23:24

Viewed 149 times

I'm working on a data stream processing project in which i will be using Apache Flink and Apache Spark and I want to use HDFS for storage. The development and testing will be done on a single node cluster with multiple physical disks.

I have already checked this question and this white paper, but I am not sure on how applicable it is to my scenario, and still confused between using the disks as separate EXT4 volumes with HDFS or creating one pool with ZFS.

I want to know how these 2 options compare in terms of performance and data loss protection, and what would be the recommended approach.

edited Oct 08 '19 at 23:24

asked Oct 08 '19 at 15:36

HUSMEN

Possible duplicate of [zfs for Hadoop cloud instead of ext4](https://serverfault.com/questions/771616/zfs-for-hadoop-cloud-instead-of-ext4) – John Mahowald Oct 08 '19 at 23:11
@JohnMahowald I've seen that question and it refers to the white paper I linked, but my use case is a little different as I mentioned in my question. – HUSMEN Oct 08 '19 at 23:21

Any benefits of ZFS over EXT4 for data stream processing on top of HDFS?

0 Answers0