Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> performance improvment on regionserver.MemStore/updateColumnValue


+
N Keywal 2011-07-20, 14:49
Copy link to this message
-
Re: performance improvment on regionserver.MemStore/updateColumnValue
I would compare YCSB between the patched and unpatched version for a more
realistic workload than the unit tests provide.

-Joey

On Wed, Jul 20, 2011 at 10:49 AM, N Keywal <[EMAIL PROTECTED]> wrote:

> Hello,
>
> Some words on the context: We're thinking about using HBase for a product
> we're developping. For this reason, I am currently looking at HBase source
> code to understand  how to debug & modify HBase.  To start with something
> simple but useful, I am looking for performance improvement by profiling
> hbase during the execution of the unit tests. I expect that many of the
> hotspots found on the unit tests are also hotspots in real production. I
> plan to spend around 10 m.d on this until september.
>
>
> The method regionserver.MemStore/updateColumnValue seems quite used, and is
> ultimatly responsible of 30% of the time in the test subsets I am using.
>
>
> There is bit of it that can be optimized easily by changing the conditions
> order:
>
>        if (firstKv.matchingQualifier(kv)) {
>          if (kv.getType() == KeyValue.Type.Put.getCode()) {
>            now = Math.max(now, kv.getTimestamp());
>          }
>        }
>
>  becomes:
>                if (kv.getType() == KeyValue.Type.Put.getCode() &&
>                        kv.getTimestamp() > now &&
>                        firstKv.matchingQualifier(kv)) {
>                    now = kv.getTimestamp();
>                }
>
>  As comparing the qualifier is much more expensive, we put it at the end.
>  It improve the performances by 3% (i.e: total execution time lowered by
> 3%).
>
>
>  So first question: would you be interested by a patch for this kind of
> stuff?
>
>
>
>  Second question (more technical...): in this method
> (regionserver.MemStore/updateColumnValue), I see:
>
>            KeyValue firstKv = KeyValue.createFirstOnRow(
>                    row, family, qualifier);
>
>            [...]
>            while (it.hasNext()) {
>                KeyValue kv = it.next();
>
>                // if this isnt the row we are interested in, then bail:
>                if (!firstKv.matchingColumn(family, qualifier) ||
> !firstKv.matchingRow(kv)) {
>                    break; // rows dont match, bail.
>                }
>
>                [...]
>            }
>
>   For the test "firstKv.matchingColumn(family, qualifier)", I don't see:
>   1) Why it is tested in the loop, as firstKv is not modified, the result
> won't change.
>   2) How the result can be 'false', as firstKv is inialized with the family
> and the parameters.
>
>   Or is it shared for update a way or another?
>
>   If we can remove it, we gain another 2%...
>
>
>   N.
>

--
Joseph Echeverria
Cloudera, Inc.
443.305.9434
+
N Keywal 2011-07-20, 15:28
+
Joey Echeverria 2011-07-20, 15:32
+
Ted Yu 2011-07-20, 15:08
+
N Keywal 2011-07-20, 15:20
+
Stack 2011-07-20, 15:15
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB