Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> issue about rowkey design


Copy link to this message
-
Re: issue about rowkey design
Multiple random seeks?

Sorry, you've lost me.

In simple design, you use an inverted table where the indexed value is the row key  and the columns contain the base table's row key.

One get() and you have all of the rows in the base table that match the key.
The only gotcha… is if your row exceeds the size of a region.
To get around this, you could write a function to periodically split the rows and to
then keep the rows in order so that your keys are always in sort order. Then your get becomes a start and stop scan where you know the start row and end row to get all of the matching rows in your base table.

This would be an efficient way to get rows based on a secondary index, however… you're really going to want to be careful on how you use it.
On Aug 18, 2013, at 9:21 PM, Vladimir Rodionov <[EMAIL PROTECTED]> wrote:

> Secondary index requires multiple random seeks and is not efficient in many cases.
> What you need is different row_keys (one for each request type)
>
> user_id, session_id, visit_time =>
>
> rowkey1 => "q1", visit_time, user_id
> rowkey2 => "q2", visit_time, session_id
> rowkey3 => "q3", user_id, session_id : ts = visit_time
>
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: [EMAIL PROTECTED]
>
> ________________________________________
> From: [EMAIL PROTECTED] [[EMAIL PROTECTED]]
> Sent: Sunday, August 18, 2013 6:25 PM
> To: [EMAIL PROTECTED]; Kiru Pakkirisamy
> Subject: Re: issue about rowkey design
>
> You can use a secondary table as a 'secondary index' setting your row as value (or column) in it.
> Enviado desde mi BlackBerry de Personal (http://www.personal.com.ar/)
>
> -----Original Message-----
> From: ch huang <[EMAIL PROTECTED]>
> Date: Mon, 19 Aug 2013 09:05:19
> To: <[EMAIL PROTECTED]>; Kiru Pakkirisamy<[EMAIL PROTECTED]>
> Reply-To: [EMAIL PROTECTED]
> Subject: Re: issue about rowkey design
>
> what you mean secondary index? has hbase secondary index?
>
> On Sat, Aug 17, 2013 at 12:48 AM, Kiru Pakkirisamy <
> [EMAIL PROTECTED]> wrote:
>
>> We did design with something equivalent to userid as the key and all the
>> user sessions in there.
>> But when we tried to look for particular user sessions within a time
>> range, we found the ColumnPrefixFilter (say on the timerange) did not give
>> us much performance.
>> So we ended up creating another table with time-range as key and all the
>> user sessions ids in it (equivalent).
>> I am pretty much repeating Bryan, but if you just use the ids, you do not
>> duplicate that much data (called secondary index ?)
>>
>> Regards,
>> - kiru
>>
>>
>> Kiru Pakkirisamy | webcloudtech.wordpress.com
>>
>>
>> ________________________________
>> From: Bryan Beaudreault <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> Sent: Friday, August 16, 2013 8:06 AM
>> Subject: Re: issue about rowkey design
>>
>>
>> HBase is all about denormalization and designing for the usecase/query
>> pattern.   If it's possible for your application it will be better to
>> provide three different indexes, as opposed to fitting them all into one
>> rowkey design.
>>
>>
>> On Fri, Aug 16, 2013 at 5:33 AM, ch huang <[EMAIL PROTECTED]> wrote:
>>
>>> hi,all
>>>     i have data (data  is very huge) with user id ,session id ,and visit
>>> time. my query pattern is ,"find all user id in certain time range,find
>> one
>>> user's all session id ,and find all session id in certain time range".
>>>   my difficult is that i can not find a rowkey that good for all the
>>> search pattern, i wonder if i need set three rowkey for these search
>>> patterns,it's say i need triple my data storage ,any good idea?
>>>
>>
>
>
> Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments.