Kimdhamilton 2013-03-06, 04:54
Kim Hamilton 2013-03-08, 01:02
Gary Helmling 2013-03-08, 01:34
Andrew Purtell 2013-03-08, 01:35
Andrew Purtell 2013-03-08, 01:13
Kim Hamilton 2013-03-05, 01:14
Andrew Purtell 2013-03-05, 01:43
Andrew Purtell 2013-03-05, 02:05
James Taylor 2013-03-05, 01:58
Gary Helmling 2013-03-05, 02:23
Gary Helmling 2013-03-05, 02:30
Stephen Boesch 2013-03-05, 04:08
Kim Hamilton 2013-03-05, 21:13
Andrew Purtell 2013-03-06, 01:58
-RE: endpoint coprocessor performance
Anoop Sam John 2013-03-06, 03:14
Yes agree with Andrew here... I checked the 94 code base yday. I also feel that the efficiency should be on the higher side.. And there is no whole table scan. The HBase client issues scan for only those regions which come under the start/stop keys that app specified. Yes it is contacting .META. to know the regions coming within the start/stop rows. But that should not be a big efficiency issue IMHO also.
@Kim - Can you do some profiling and let us know which area of code is eating up time in your case?
HBASE-6877 also I am seeing.
From: Andrew Purtell [[EMAIL PROTECTED]]
Sent: Wednesday, March 06, 2013 7:28 AM
To: [EMAIL PROTECTED]
Subject: Re: endpoint coprocessor performance
> In current logic, HTable#coprocessorExec always scan the whole table, its
efficiency is low
No, I don't think that is correct.
In its current logic, coprocessorExec always scans the META table for all
regions of the target table, to find the up to date locations, and then
dispatches the exec in parallel to all regions of the target table. The
efficiency of the exec is actually high because invocations happen in
parallel across the cluster, with results reassembled back at the client as
they come in.
The increased setup latency relative to a Scan and the load on META is
because of the initial scan on META to find the up to date locations of all
regions of the target table. For a Scan, the cached locations of regions
are used, and relocations are handled transparently by the client. Exec
could be updated to do this as well.
On Wed, Mar 6, 2013 at 5:13 AM, Kim Hamilton <[EMAIL PROTECTED]> wrote:
> Thanks so much! This describes exactly what I'm seeing. I did notice
> extremely heavy load on the region server carrying .META., as described in
> In current logic, HTable#coprocessorExec always scan the whole table,
> its efficiency
> is low and will affect the Regionserver carrying .META. under large
> coprocessorExec requests
> Thanks again,
> On Mon, Mar 4, 2013 at 8:08 PM, Stephen Boesch <[EMAIL PROTECTED]> wrote:
> > great question from Kim and follow-up/answers.
> > 2013/3/4 Gary Helmling <[EMAIL PROTECTED]>
> > > I see this is HBASE-6870. I thought that sounded familiar.
> > >
> > >
> > > On Mon, Mar 4, 2013 at 6:23 PM, Gary Helmling <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > >
> > > > Check your logs for whether your end-point coprocessor is hitting
> > > >> zookeeper on every invocation to figure out the region start key.
> > > >> Unfortunately (at least last time I checked), the default way of
> > > invoking
> > > >> an end point coprocessor doesn't use the meta cache. You can go
> > through
> > > a
> > > >> combination of the following instead:
> > > >> HRegionLocation regionLocation = retried ?
> > > >> connection.relocateRegion(**tableName, tableKey) :
> > > >> connection.locateRegion(**tableName, tableKey);
> > > >> ...
> > > >> Then call HConnection.processExecs call, passing in the regionKeys
> > from
> > > >> above.
> > > >> You can trap the error case of the region being relocated and try
> > again
> > > >> with retried = true and it'll update the meta data cache when
> > > >> relocateRegion is called.
> > > >>
> > > >
> > > >
> > > > Any idea if we have an improvement logged in JIRA for this? This is
> > > > definitely something we should improve on.
> > > >
> > >
Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)
Gary Helmling 2013-03-05, 01:42