I was looking at options for reducing the overall cost of storage that is incurred due to replication of data across the datanodes for higher availability and data localization for processing.
I stumbled on a few articles suggesting erasure coding (software-raid) as one such mechanism which can provide upt 5 to 8 9s of availability while keeping the replication factor low.
I also came across a JIRA for erasure coding in HDFS
I will need some help to understand the following
1. How can I use erasure coding with Hadoop 1.1.1 release?
2. How will erasure coding work with replication mechanism and how will it affect the data locality aspect for data processing, since erasure coding fragments the data?
3. How mature is the current implementation of erasure coding in HDFS?
Any help will be greatly appreciated.
Thanks and Regards
NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.