0

I'm working on a data stream processing project in which i will be using Apache Flink and Apache Spark and I want to use HDFS for storage. The development and testing will be done on a single node cluster with multiple physical disks.

I have already checked this question and this white paper, but I am not sure on how applicable it is to my scenario, and still confused between using the disks as separate EXT4 volumes with HDFS or creating one pool with ZFS.

I want to know how these 2 options compare in terms of performance and data loss protection, and what would be the recommended approach.

HUSMEN
  • 1
  • 2
  • Possible duplicate of [zfs for Hadoop cloud instead of ext4](https://serverfault.com/questions/771616/zfs-for-hadoop-cloud-instead-of-ext4) – John Mahowald Oct 08 '19 at 23:11
  • @JohnMahowald I've seen that question and it refers to the white paper I linked, but my use case is a little different as I mentioned in my question. – HUSMEN Oct 08 '19 at 23:21

0 Answers0