Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase secondary index performance


Copy link to this message
-
Re: HBase secondary index performance
2010/9/6 Murali Krishna. P <[EMAIL PROTECTED]>:
> Hi,
>   My row size is around 300 bytes with total 20 columns. I tried the custom
> indexing without the write to WAL. Currently having only 2 tables, one for the
> main table and another for all 20 indexes. My key to the index table is
> columnValue+columnName+rowKey.

As mentioned before, you can randomize you index insertions.
If you don't order scan or range scan on columnValue, you can
prefix it with some hash, f.e. sha(columnValue) + columnValue +
columnName + rowKey.
This remove hotspot in one of your region servers.

> I am getting around 500 inserts/second now. (ie, total of ~10K puts). This is
> probably comparable with your numbers based on the data size.
Are all region servers get equal load, or some servers are more busy,
then others?

>  I have some doubts with the hbase write implementation.
> * Is this the max that we can achieve with any number of region servers? Why
> adding region servers not improving the write performance? Is it because when
> the data doesn't exist in the table, it always writes to one region ?
In general - yes. Before tables splits, you will get all writes into
one region server.

> * Probably writing to an existing, well distributed table might give better
> performance since the writes will be across machines ? In that case, if we have
> multiple tables (one per index), will it be better during the initial write
> itself (since each table has different region) ??
More servers affect the recording, the better.

 Andrey.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB