Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - HBase secondary index performance


Copy link to this message
-
Re: HBase secondary index performance
Andrey Stepachev 2010-09-06, 18:46
2010/9/6 Murali Krishna. P <[EMAIL PROTECTED]>:
> Hi,
>   My row size is around 300 bytes with total 20 columns. I tried the custom
> indexing without the write to WAL. Currently having only 2 tables, one for the
> main table and another for all 20 indexes. My key to the index table is
> columnValue+columnName+rowKey.

As mentioned before, you can randomize you index insertions.
If you don't order scan or range scan on columnValue, you can
prefix it with some hash, f.e. sha(columnValue) + columnValue +
columnName + rowKey.
This remove hotspot in one of your region servers.

> I am getting around 500 inserts/second now. (ie, total of ~10K puts). This is
> probably comparable with your numbers based on the data size.
Are all region servers get equal load, or some servers are more busy,
then others?

>  I have some doubts with the hbase write implementation.
> * Is this the max that we can achieve with any number of region servers? Why
> adding region servers not improving the write performance? Is it because when
> the data doesn't exist in the table, it always writes to one region ?
In general - yes. Before tables splits, you will get all writes into
one region server.

> * Probably writing to an existing, well distributed table might give better
> performance since the writes will be across machines ? In that case, if we have
> multiple tables (one per index), will it be better during the initial write
> itself (since each table has different region) ??
More servers affect the recording, the better.

 Andrey.