I'm working on a data stream processing project in which i will be using Apache Flink and Apache Spark and I want to use HDFS for storage. The development and testing will be done on a single node cluster with multiple physical disks.
I have already checked this question and this white paper, but I am not sure on how applicable it is to my scenario, and still confused between using the disks as separate EXT4 volumes with HDFS or creating one pool with ZFS.
I want to know how these 2 options compare in terms of performance and data loss protection, and what would be the recommended approach.