Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Set like functionality


Copy link to this message
-
Re: Set like functionality
A lot of your design depends on your read/write rate & the amount of
duplication in your inserts.  For example, if your read rate is really low
and your write rate is really high with a low dedupe, you could try:

Row = USER_ID
Column Qualifier = PRODUCT_ID
MAX_VERSIONS = 1

Setting the max versions for a CF to 1 basically allows the dedupe kick in
& treat your column qualifier as a set.  Putting the data in the CF
instead of the value feed means that you'll dedupe on read demand instead
of read-modify-write.  That said, RMW works better with high dedupe or a
high read rate because you'd otherwise write unnecessary duplicate values
on flush.  Also, with read-modify-write, consider using bloom filters if
you have a high miss rate.  It's cheaper to do a bloom filter query of a
really large file if the key doesn't exist most of the time.  We used this
to store unique email thread UUIDs for our messaging application.

I'm guessing this might be a little too advanced for your question if your
just getting up and going.  I'm more trying to help you understand that
you should think about how your read/write/re-write/modify data flow is
going to look because HBase has a lot off knobs to optimize for a wide
variety of flow situations.

Nicolas

On 2/10/12 4:45 AM, "weichao" <[EMAIL PROTECTED]> wrote:

>Maybe you can build a index-table,   like
>
>rowkey:[USER_ID/ProductID] = { rk => main-table's rowkey}
>
>when view a product, check Index, find the rk, use the rk to get row from
>Main-talbe. delete this row, modify index-talbe's rk.
>
>of cause, use coprocessor to handle this may make it simple...
>
>
>2012/2/9 Mark <[EMAIL PROTECTED]>
>
>> We would like to maintain a history of all product views by a given
>>user.
>> We are currently using a row key like USER_ID_ID/TIMESTAMP. This works
>> however we would like to maintain a unique list of these users to
>>product
>> views.
>>
>> So if i have rows like:
>>
>> mark/1328731167014262  = { data => 'Product 123' }
>> mark/1328731162502304  = { data => 'Product 456' }
>> mark/1328731157711375  = { data => 'Product 789' }
>>
>> And I view Product 789 again I want it to be like:
>>
>> mark/1328731292355173  = { data => 'Product 789' }
>> mark/1328731167014262  = { data => 'Product 123' }
>> mark/1328731162502304  = { data => 'Product 456' }
>>
>> So it basically replaces the old value. How can this be accomplished?
>>
>> Thanks
>>