Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> replicate data in HDFS with smarter encoding


Copy link to this message
-
Re: replicate data in HDFS with smarter encoding
Facebook contributed some code to do something similar called HDFS RAID:

http://wiki.apache.org/hadoop/HDFS-RAID

-Joey
On Jul 18, 2011, at 3:41, Da Zheng <[EMAIL PROTECTED]> wrote:

> Hello,
>
> It seems that data replication in HDFS is simply data copy among nodes. Has
> anyone considered to use a better encoding to reduce the data size? Say, a block
> of data is split into N pieces, and as long as M pieces of data survive in the
> network, we can regenerate original data.
>
> There are many benefits to reduce the data size. It can save network and disk
> benefit, and thus reduce energy consumption. Computation power might be a
> concern, but we can use GPU to encode and decode.
>
> But maybe the idea is stupid or it's hard to reduce the data size. I would like
> to hear your comments.
>
> Thanks,
> Da
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB