Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> endpoint coprocessor performance


Copy link to this message
-
Re: endpoint coprocessor performance
> In current logic, HTable#coprocessorExec always scan the whole table, its
efficiency is low

No, I don't think that is correct.

In its current logic, coprocessorExec always scans the META table for all
regions of the target table, to find the up to date locations, and then
dispatches the exec in parallel to all regions of the target table. The
efficiency of the exec is actually high because invocations happen in
parallel across the cluster, with results reassembled back at the client as
they come in.

The increased setup latency relative to a Scan and the load on META is
because of the initial scan on META to find the up to date locations of all
regions of the target table. For a Scan, the cached locations of regions
are used, and relocations are handled transparently by the client. Exec
could be updated to do this as well.
On Wed, Mar 6, 2013 at 5:13 AM, Kim Hamilton <[EMAIL PROTECTED]> wrote:

> Thanks so much! This describes exactly what I'm seeing. I did notice
> extremely heavy load on the region server carrying .META., as described in
> HBASE-6870:
>
> In current logic, HTable#coprocessorExec always scan the whole table,
> its efficiency
> is low and will affect the Regionserver carrying .META. under large
> coprocessorExec requests
>
>
> Thanks again,
> Kim
> On Mon, Mar 4, 2013 at 8:08 PM, Stephen Boesch <[EMAIL PROTECTED]> wrote:
>
> > great question from Kim and follow-up/answers.
> >
> >
> > 2013/3/4 Gary Helmling <[EMAIL PROTECTED]>
> >
> > > I see this is HBASE-6870.  I thought that sounded familiar.
> > >
> > >
> > > On Mon, Mar 4, 2013 at 6:23 PM, Gary Helmling <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > >
> > > > Check your logs for whether your end-point coprocessor is hitting
> > > >> zookeeper on every invocation to figure out the region start key.
> > > >> Unfortunately (at least last time I checked), the default way of
> > > invoking
> > > >> an end point coprocessor doesn't use the meta cache. You can go
> > through
> > > a
> > > >> combination of the following instead:
> > > >>     HRegionLocation regionLocation = retried ?
> > > >>         connection.relocateRegion(**tableName, tableKey) :
> > > >>         connection.locateRegion(**tableName, tableKey);
> > > >>     ...
> > > >> Then call HConnection.processExecs call, passing in the regionKeys
> > from
> > > >> above.
> > > >> You can trap the error case of the region being relocated and try
> > again
> > > >> with retried = true and it'll update the meta data cache when
> > > >> relocateRegion is called.
> > > >>
> > > >
> > > >
> > > > Any idea if we have an improvement logged in JIRA for this?  This is
> > > > definitely something we should improve on.
> > > >
> > >
> >
>

--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB