Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Re: Hbase update use case


Copy link to this message
-
Re: Hbase update use case
If you can mark a row by adding a column qualifier which will be used as
your flag by its existence, and its name will be lexicographically first,
then it won't be slow as you said about filters below.

On Monday, August 12, 2013, ccalugaru wrote:

> Hi all,
> I have the following hbase use case:
> One Hbase table, with a row key (built with a combination of md5 hashes)
> and
> 2 column families. Logically, the table stores sentences. The table has
> hundreds of millions of records.
>
> I have a webapp that connects to this hbase table, and needs to randomly
> export sentences, based on some conditions. Currently, all these conditions
> can be looked-up just by using the rowkey.
> Typically, one export would contain just a couple of hundreds sentences.
> The
> important restriction is that once some segments are exported, they should
> not be present in any subsequent export.
>
> So my question is related to this - how should I make sure the same
> segments
> do not get exported again?
>
> Should I 'mark' the exported segments, by updating a flag, after each
> export
> happens? This has the drawback that, when looking at which segments meet my
> conditions, I wouldn't be able to use just the rowkey for identifying those
> records, but also that flag. Hence, I would need to use filters, which I
> know are way slower.
>
> Is there a better approach for this?
>
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/Hbase-update-use-case-tp4049091.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB