Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Data Deduplication in HBase


Copy link to this message
-
Data Deduplication in HBase
Hi,

I have a use case in which I need to store segments of mp3 files in hbase.
A song may come to the application in different ovelapping segments. For
example, a 5 min song can have the following segments 0-1,0.5-2,2-4,3-5. As
seen, some of the data is duplicate (3-4 is present in the last 2
segments).

What would be the ideal way of removing this duplicate storage? Will snappy
compression help here or do I need to write some logic over HBase? Also,
what if I store a single segment multiple times. Will hbase do some sort of
deduplication?

Regards,
Anand
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB