Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> performance improvment on regionserver.MemStore/updateColumnValue


Copy link to this message
-
Re: performance improvment on regionserver.MemStore/updateColumnValue
I would compare YCSB between the patched and unpatched version for a more
realistic workload than the unit tests provide.

-Joey

On Wed, Jul 20, 2011 at 10:49 AM, N Keywal <[EMAIL PROTECTED]> wrote:

> Hello,
>
> Some words on the context: We're thinking about using HBase for a product
> we're developping. For this reason, I am currently looking at HBase source
> code to understand  how to debug & modify HBase.  To start with something
> simple but useful, I am looking for performance improvement by profiling
> hbase during the execution of the unit tests. I expect that many of the
> hotspots found on the unit tests are also hotspots in real production. I
> plan to spend around 10 m.d on this until september.
>
>
> The method regionserver.MemStore/updateColumnValue seems quite used, and is
> ultimatly responsible of 30% of the time in the test subsets I am using.
>
>
> There is bit of it that can be optimized easily by changing the conditions
> order:
>
>        if (firstKv.matchingQualifier(kv)) {
>          if (kv.getType() == KeyValue.Type.Put.getCode()) {
>            now = Math.max(now, kv.getTimestamp());
>          }
>        }
>
>  becomes:
>                if (kv.getType() == KeyValue.Type.Put.getCode() &&
>                        kv.getTimestamp() > now &&
>                        firstKv.matchingQualifier(kv)) {
>                    now = kv.getTimestamp();
>                }
>
>  As comparing the qualifier is much more expensive, we put it at the end.
>  It improve the performances by 3% (i.e: total execution time lowered by
> 3%).
>
>
>  So first question: would you be interested by a patch for this kind of
> stuff?
>
>
>
>  Second question (more technical...): in this method
> (regionserver.MemStore/updateColumnValue), I see:
>
>            KeyValue firstKv = KeyValue.createFirstOnRow(
>                    row, family, qualifier);
>
>            [...]
>            while (it.hasNext()) {
>                KeyValue kv = it.next();
>
>                // if this isnt the row we are interested in, then bail:
>                if (!firstKv.matchingColumn(family, qualifier) ||
> !firstKv.matchingRow(kv)) {
>                    break; // rows dont match, bail.
>                }
>
>                [...]
>            }
>
>   For the test "firstKv.matchingColumn(family, qualifier)", I don't see:
>   1) Why it is tested in the loop, as firstKv is not modified, the result
> won't change.
>   2) How the result can be 'false', as firstKv is inialized with the family
> and the parameters.
>
>   Or is it shared for update a way or another?
>
>   If we can remove it, we gain another 2%...
>
>
>   N.
>

--
Joseph Echeverria
Cloudera, Inc.
443.305.9434
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB