Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Set like functionality


Copy link to this message
-
Re: Set like functionality
A lot of your design depends on your read/write rate & the amount of
duplication in your inserts.  For example, if your read rate is really low
and your write rate is really high with a low dedupe, you could try:

Row = USER_ID
Column Qualifier = PRODUCT_ID
MAX_VERSIONS = 1

Setting the max versions for a CF to 1 basically allows the dedupe kick in
& treat your column qualifier as a set.  Putting the data in the CF
instead of the value feed means that you'll dedupe on read demand instead
of read-modify-write.  That said, RMW works better with high dedupe or a
high read rate because you'd otherwise write unnecessary duplicate values
on flush.  Also, with read-modify-write, consider using bloom filters if
you have a high miss rate.  It's cheaper to do a bloom filter query of a
really large file if the key doesn't exist most of the time.  We used this
to store unique email thread UUIDs for our messaging application.

I'm guessing this might be a little too advanced for your question if your
just getting up and going.  I'm more trying to help you understand that
you should think about how your read/write/re-write/modify data flow is
going to look because HBase has a lot off knobs to optimize for a wide
variety of flow situations.

Nicolas

On 2/10/12 4:45 AM, "weichao" <[EMAIL PROTECTED]> wrote:

>Maybe you can build a index-table,   like
>
>rowkey:[USER_ID/ProductID] = { rk => main-table's rowkey}
>
>when view a product, check Index, find the rk, use the rk to get row from
>Main-talbe. delete this row, modify index-talbe's rk.
>
>of cause, use coprocessor to handle this may make it simple...
>
>
>2012/2/9 Mark <[EMAIL PROTECTED]>
>
>> We would like to maintain a history of all product views by a given
>>user.
>> We are currently using a row key like USER_ID_ID/TIMESTAMP. This works
>> however we would like to maintain a unique list of these users to
>>product
>> views.
>>
>> So if i have rows like:
>>
>> mark/1328731167014262  = { data => 'Product 123' }
>> mark/1328731162502304  = { data => 'Product 456' }
>> mark/1328731157711375  = { data => 'Product 789' }
>>
>> And I view Product 789 again I want it to be like:
>>
>> mark/1328731292355173  = { data => 'Product 789' }
>> mark/1328731167014262  = { data => 'Product 123' }
>> mark/1328731162502304  = { data => 'Product 456' }
>>
>> So it basically replaces the old value. How can this be accomplished?
>>
>> Thanks
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB