Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase secondary index performance

Copy link to this message
Re: HBase secondary index performance
2010/9/3 Murali Krishna. P <[EMAIL PROTECTED]>:

>        * custom indexing is good, but our data keeps changing every day. So, probably
> indextable is the best option for us

In case of custom indexing you can use timestamps to check, that index
record still valid.
(or ever simply recheck existance of the value)
Also you need regular index cleanup (mr job or some custom application).

To index some row identified by 'key' having 'value', we can create
index table,
where key will be [value:key] and insert rows every time, when we insert
our values. We will got 30k rows/s/node.
When we want to find all 'value', we scan [value:0000, value:9999] and
find all keys,
which point to rows, containing values.
We scan index, random get rows, recheck, that index is still valid
(check value or timestamp, index timestamp should be >= value timestamp) and
return only valid values (may be we can even delete on the fly when we
got negative
result to automatically clenup stale data).
>        * Just added one more regionserver and it did not help. Actually it went back
> to 60/s for some strange reason(with one client). The requests in the hbase ui
> is not uniform across 2 region servers. One server is doing around 2000 and the
> other 500. Probably once the region gets split and when we have lots of data,
> writes will improve ? (Now it is just writing to one region for the main table)

Looks like all data goes to one region server. Try to make more random writes
(may be you should make key as random uuid or other key randomization technique)

>        * Is there some way to do bulk load the indexedtable? Earlier I have used the
> bulk loader tool (mapreduce job which creates the regions offline) but not sure
> whether it works with indexed table.

No sure, but you can look at source code, and try to emulate indexing
operations in
your code after regular bulk loading.

>  Thanks,
> Murali Krishna