Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Data Deduplication in HBase


Copy link to this message
-
Re: Data Deduplication in HBase
bq.  Will hbase do some sort of deduplication?

I don't think so.

What is the granularity of segment overlap ? In the above example, it seems
to be 0.5

Cheers
On Tue, Aug 27, 2013 at 7:12 AM, Anand Nalya <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I have a use case in which I need to store segments of mp3 files in hbase.
> A song may come to the application in different ovelapping segments. For
> example, a 5 min song can have the following segments 0-1,0.5-2,2-4,3-5. As
> seen, some of the data is duplicate (3-4 is present in the last 2
> segments).
>
> What would be the ideal way of removing this duplicate storage? Will snappy
> compression help here or do I need to write some logic over HBase? Also,
> what if I store a single segment multiple times. Will hbase do some sort of
> deduplication?
>
> Regards,
> Anand
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB