If you can mark a row by adding a column qualifier which will be used as
your flag by its existence, and its name will be lexicographically first,
then it won't be slow as you said about filters below.
On Monday, August 12, 2013, ccalugaru wrote:
> Hi all,
> I have the following hbase use case:
> One Hbase table, with a row key (built with a combination of md5 hashes)
> 2 column families. Logically, the table stores sentences. The table has
> hundreds of millions of records.
> I have a webapp that connects to this hbase table, and needs to randomly
> export sentences, based on some conditions. Currently, all these conditions
> can be looked-up just by using the rowkey.
> Typically, one export would contain just a couple of hundreds sentences.
> important restriction is that once some segments are exported, they should
> not be present in any subsequent export.
> So my question is related to this - how should I make sure the same
> do not get exported again?
> Should I 'mark' the exported segments, by updating a flag, after each
> happens? This has the drawback that, when looking at which segments meet my
> conditions, I wouldn't be able to use just the rowkey for identifying those
> records, but also that flag. Hence, I would need to use filters, which I
> know are way slower.
> Is there a better approach for this?
> View this message in context:
> Sent from the HBase User mailing list archive at Nabble.com.