Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> performance improvment on regionserver.MemStore/updateColumnValue


Copy link to this message
-
Re: performance improvment on regionserver.MemStore/updateColumnValue
My suggestion was just to verify the performance gains. Do the profiling on
unit tests and then do scale up tests with YCSB.

-Joey

On Wed, Jul 20, 2011 at 11:28 AM, N Keywal <[EMAIL PROTECTED]> wrote:

> Aggreed. But there is a big advantage when you work on the issues found on
> the unit tesst: the code you're modifying is already covered by the unit
> tests... :-)
>
> On Wed, Jul 20, 2011 at 5:02 PM, Joey Echeverria <[EMAIL PROTECTED]>
> wrote:
>
> > I would compare YCSB between the patched and unpatched version for a more
> > realistic workload than the unit tests provide.
> >
> > -Joey
> >
> > On Wed, Jul 20, 2011 at 10:49 AM, N Keywal <[EMAIL PROTECTED]> wrote:
> >
> > > Hello,
> > >
> > > Some words on the context: We're thinking about using HBase for a
> product
> > > we're developping. For this reason, I am currently looking at HBase
> > source
> > > code to understand  how to debug & modify HBase.  To start with
> something
> > > simple but useful, I am looking for performance improvement by
> profiling
> > > hbase during the execution of the unit tests. I expect that many of the
> > > hotspots found on the unit tests are also hotspots in real production.
> I
> > > plan to spend around 10 m.d on this until september.
> > >
> > >
> > > The method regionserver.MemStore/updateColumnValue seems quite used,
> and
> > is
> > > ultimatly responsible of 30% of the time in the test subsets I am
> using.
> > >
> > >
> > > There is bit of it that can be optimized easily by changing the
> > conditions
> > > order:
> > >
> > >        if (firstKv.matchingQualifier(kv)) {
> > >          if (kv.getType() == KeyValue.Type.Put.getCode()) {
> > >            now = Math.max(now, kv.getTimestamp());
> > >          }
> > >        }
> > >
> > >  becomes:
> > >                if (kv.getType() == KeyValue.Type.Put.getCode() &&
> > >                        kv.getTimestamp() > now &&
> > >                        firstKv.matchingQualifier(kv)) {
> > >                    now = kv.getTimestamp();
> > >                }
> > >
> > >  As comparing the qualifier is much more expensive, we put it at the
> end.
> > >  It improve the performances by 3% (i.e: total execution time lowered
> by
> > > 3%).
> > >
> > >
> > >  So first question: would you be interested by a patch for this kind of
> > > stuff?
> > >
> > >
> > >
> > >  Second question (more technical...): in this method
> > > (regionserver.MemStore/updateColumnValue), I see:
> > >
> > >            KeyValue firstKv = KeyValue.createFirstOnRow(
> > >                    row, family, qualifier);
> > >
> > >            [...]
> > >            while (it.hasNext()) {
> > >                KeyValue kv = it.next();
> > >
> > >                // if this isnt the row we are interested in, then bail:
> > >                if (!firstKv.matchingColumn(family, qualifier) ||
> > > !firstKv.matchingRow(kv)) {
> > >                    break; // rows dont match, bail.
> > >                }
> > >
> > >                [...]
> > >            }
> > >
> > >   For the test "firstKv.matchingColumn(family, qualifier)", I don't
> see:
> > >   1) Why it is tested in the loop, as firstKv is not modified, the
> result
> > > won't change.
> > >   2) How the result can be 'false', as firstKv is inialized with the
> > family
> > > and the parameters.
> > >
> > >   Or is it shared for update a way or another?
> > >
> > >   If we can remove it, we gain another 2%...
> > >
> > >
> > >   N.
> > >
> >
> >
> >
> > --
> > Joseph Echeverria
> > Cloudera, Inc.
> > 443.305.9434
> >
>

--
Joseph Echeverria
Cloudera, Inc.
443.305.9434