0

I'm trying to determine if there is any practical advantage to configuring a RAID array on the instance store of a 3x d2.2xlarge instances being used for HDFS. Initially I planned to just mount each store and add it as an additional data directory for Hadoop. But it seems there could be some additional performance gains with a RAID 0 or 10 configuration. Since durability is handled by HDFS itself there is no need to consider RAID 1 or 5 from that perspective ( eg: if one or all stores failed on an instance, durability is provided by replication from the other data nodes). RAID 6 seems impractical due to known issues with long rebuild times and slowed throughput performance due to 2x parity writes (again it seems best to let HDFS handle durability). That leaves RAID 0 and 10 that both theoretically have better disk I/O than a standard HDD. Would HDFS have observable performance gains on a RAID array for the instance store?

John R
  • 383
  • 4
  • 13

1 Answers1

1

Honestly speaking using RAID for HDFS is not at all recommended. there is an thread on cloudera community portal -

https://community.cloudera.com/t5/Support-Questions/Should-we-use-RAID-with-Hadoop/td-p/201381

https://community.cloudera.com/t5/Support-Questions/Do-we-config-our-hadoop-right-JBOD-vs-RAID/td-p/187997

Regarding this point.