1

Right now I have couple of linodes with ext4. I have a hadoop setup. What benefit would I get if I migrate my file system from ext4 to zfs.

  • Will there be any benefit in response times?
  • Any speed optimization while data gets exchanged in local lan ?
  • If i add up a new linode in my cloud, will the sync time gets reduced than compared to ext4 ?

    Also what are the down sides ?

84104
  • 12,698
  • 6
  • 43
  • 75
M-BoB
  • 11
  • 1

1 Answers1

3

From the white paper from Adurant:

The benefits of this configuration include:

  • Reduced Hadoop cluster overhead by reducing the replication factor to 2x
  • Reduced storage (disk space) requirements by reducing the replication factor to 2x
  • Increased the number of copies of data to 4x via the ZFS Storage Appliance
  • Added data compression via the ZFS Storage Appliance o Further reducing storage space requirements even in a mirrored pool configuration
  • Added read and write caching via the ZFS Storage Appliance decreasing I/O response times
  • Added data protection (RAID 1) with no added overhead to the Hadoop cluster
  • Added fault tolerance via the ZFS Storage Appliance’s clustered heads

And the results:

The findings of the Hadoop ZFS Proof of Concept testing clearly indicate that the ZFS Storage Appliance is more than able to handle current Hadoop workloads. Data processing was CPU bound, memory utilization was nominal, I/O utilization was nominal, and data was compressed by a minimum of 3.5x.

Of course, things like compression efficiency depend largely on your data, and performance is not only dependent on design, but also on the actual hardware. The document also gives a rundown of the setup. You could replicate it in a smaller way with less nodes and a portion of your real data and run your own benchmarks.

user121391
  • 2,452
  • 12
  • 31