Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> performance improvment on regionserver.MemStore/updateColumnValue


Copy link to this message
-
Re: performance improvment on regionserver.MemStore/updateColumnValue
My suggestion was just to verify the performance gains. Do the profiling on
unit tests and then do scale up tests with YCSB.

-Joey

On Wed, Jul 20, 2011 at 11:28 AM, N Keywal <[EMAIL PROTECTED]> wrote:

> Aggreed. But there is a big advantage when you work on the issues found on
> the unit tesst: the code you're modifying is already covered by the unit
> tests... :-)
>
> On Wed, Jul 20, 2011 at 5:02 PM, Joey Echeverria <[EMAIL PROTECTED]>
> wrote:
>
> > I would compare YCSB between the patched and unpatched version for a more
> > realistic workload than the unit tests provide.
> >
> > -Joey
> >
> > On Wed, Jul 20, 2011 at 10:49 AM, N Keywal <[EMAIL PROTECTED]> wrote:
> >
> > > Hello,
> > >
> > > Some words on the context: We're thinking about using HBase for a
> product
> > > we're developping. For this reason, I am currently looking at HBase
> > source
> > > code to understand  how to debug & modify HBase.  To start with
> something
> > > simple but useful, I am looking for performance improvement by
> profiling
> > > hbase during the execution of the unit tests. I expect that many of the
> > > hotspots found on the unit tests are also hotspots in real production.
> I
> > > plan to spend around 10 m.d on this until september.
> > >
> > >
> > > The method regionserver.MemStore/updateColumnValue seems quite used,
> and
> > is
> > > ultimatly responsible of 30% of the time in the test subsets I am
> using.
> > >
> > >
> > > There is bit of it that can be optimized easily by changing the
> > conditions
> > > order:
> > >
> > >        if (firstKv.matchingQualifier(kv)) {
> > >          if (kv.getType() == KeyValue.Type.Put.getCode()) {
> > >            now = Math.max(now, kv.getTimestamp());
> > >          }
> > >        }
> > >
> > >  becomes:
> > >                if (kv.getType() == KeyValue.Type.Put.getCode() &&
> > >                        kv.getTimestamp() > now &&
> > >                        firstKv.matchingQualifier(kv)) {
> > >                    now = kv.getTimestamp();
> > >                }
> > >
> > >  As comparing the qualifier is much more expensive, we put it at the
> end.
> > >  It improve the performances by 3% (i.e: total execution time lowered
> by
> > > 3%).
> > >
> > >
> > >  So first question: would you be interested by a patch for this kind of
> > > stuff?
> > >
> > >
> > >
> > >  Second question (more technical...): in this method
> > > (regionserver.MemStore/updateColumnValue), I see:
> > >
> > >            KeyValue firstKv = KeyValue.createFirstOnRow(
> > >                    row, family, qualifier);
> > >
> > >            [...]
> > >            while (it.hasNext()) {
> > >                KeyValue kv = it.next();
> > >
> > >                // if this isnt the row we are interested in, then bail:
> > >                if (!firstKv.matchingColumn(family, qualifier) ||
> > > !firstKv.matchingRow(kv)) {
> > >                    break; // rows dont match, bail.
> > >                }
> > >
> > >                [...]
> > >            }
> > >
> > >   For the test "firstKv.matchingColumn(family, qualifier)", I don't
> see:
> > >   1) Why it is tested in the loop, as firstKv is not modified, the
> result
> > > won't change.
> > >   2) How the result can be 'false', as firstKv is inialized with the
> > family
> > > and the parameters.
> > >
> > >   Or is it shared for update a way or another?
> > >
> > >   If we can remove it, we gain another 2%...
> > >
> > >
> > >   N.
> > >
> >
> >
> >
> > --
> > Joseph Echeverria
> > Cloudera, Inc.
> > 443.305.9434
> >
>

--
Joseph Echeverria
Cloudera, Inc.
443.305.9434
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB