Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Secondary indexes suggestions


Copy link to this message
-
Re: Secondary indexes suggestions
Michael Segel 2012-08-14, 00:28
Not really a good idea or anything new.
Essentially a full table scan where you're doing a closer inspection on the key to see if it matches your search regex, before actually fetching the entire row and returning it.

Secondary indexes are pretty straight forward.
You have your primary key and then your value.
Secondary index has a table where the key be one of your values from the main base table, and then the value is the key from the base table.

So if your main key is 12345, and you store {'Fred', 'Cleveland', 'Ohio'}  == {Name, City, State}

You could create an index on State where you store 'Ohio' as the key, and a column value of 12345.

Then if you search the second table on a row with the key 'Ohio', you'll get all the rows where there is a record in the base table. In this example. a row with the key '12345' ...
HTH
On Aug 13, 2012, at 4:49 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

> Lukáš, have a look at this recent post on this topic:
>
>
> http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/
>
>
> Otis
> ----
> Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm
>
>
>
>> ________________________________
>> From: Lukáš Drbal <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> Sent: Sunday, August 12, 2012 8:15 AM
>> Subject: Secondary indexes suggestions
>>
>> Hi all,
>>
>> iam new user of Hbase and i need help with secondary indexes.
>>
>> For example i have messages and users. Each user has many messages.
>> Data structure will be like this:
>>
>> Message:
>> - String id
>> - Long sender_id
>> - Long recipient_id
>> - String text
>> - Timestamp created_at
>> [...]
>>
>> User:
>> - Long id
>> - String username
>> [...]
>>
>> I need create secondary indexes for reading all messages:
>> a) inbox (by recipient_id) in timerange.
>> b) outbox (by sender_id) in timerange
>>
>> Can someone give me suggestions for this index(es) and attributes for
>> columnFamily?
>> I expect here 500M messages and 50M users.
>>
>> Thanks a lot for response.
>>
>>
>> P.S. Sorry for my bad english, isn't my primary language
>>
>>
>> Lukas Drbal
>>
>>