Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Data Deduplication in HBase


Copy link to this message
-
Data Deduplication in HBase
Hi,

I have a use case in which I need to store segments of mp3 files in hbase.
A song may come to the application in different ovelapping segments. For
example, a 5 min song can have the following segments 0-1,0.5-2,2-4,3-5. As
seen, some of the data is duplicate (3-4 is present in the last 2
segments).

What would be the ideal way of removing this duplicate storage? Will snappy
compression help here or do I need to write some logic over HBase? Also,
what if I store a single segment multiple times. Will hbase do some sort of
deduplication?

Regards,
Anand