Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Client Get vs Coprocessor scan performance


Copy link to this message
-
Re: Client Get vs Coprocessor scan performance
Ted Yu 2013-08-17, 23:19
HBASE-6870 targeted whole table scanning for each coprocessorService call
which exhibited itself through:

HTable#coprocessorService -> getStartKeysInRange -> getStartEndKeys ->
getRegionLocations -> MetaScanner.allTableRegions(getConfiguration(),
getTableName(), false)

The cached region locations in HConnectionImplementation would be used.

Cheers
On Sat, Aug 17, 2013 at 2:21 PM, Asaf Mesika <[EMAIL PROTECTED]> wrote:

> Ted, can you elaborate a little bit why this issue boosts performance?
> I couldn't figure out from the issue comments if they execCoprocessor scans
> the entire .META. table or and entire table, to understand the actual
> improvement.
>
> Thanks!
>
>
>
>
> On Fri, Aug 9, 2013 at 8:44 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > I think you need HBASE-6870 which went into 0.94.8
> >
> > Upgrading should boost coprocessor performance.
> >
> > Cheers
> >
> > On Aug 8, 2013, at 10:21 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]
> >
> > wrote:
> >
> > > Ted,
> > > Here is the method signature/protocol
> > > public Map<String, Double> getFooMap<String, Double> input,
> > > int topN) throws IOException;
> > >
> > > There are 31 regions on 4 nodes X 8 CPU.
> > > I am on 0.94.6 (from Hortonworks).
> > > I think it seems to behave like what linwukang says, - it is almost a
> > full table scan in the coprocessor.
> > > Actually, when I set more specific ColumnPrefixFilters performance went
> > down.
> > > I want to do things on the server side because, I dont want to be
> > sending 500K column/values to the client.
> > > I cannot believe a single-threaded client which does some calculations
> > and group-by  beats the coprocessor running in 31 regions.
> > >
> > > Regards,
> > > - kiru
> > >
> > >
> > > Kiru Pakkirisamy | webcloudtech.wordpress.com
> > >
> > >
> > > ________________________________
> > > From: Ted Yu <[EMAIL PROTECTED]>
> > > To: [EMAIL PROTECTED]; Kiru Pakkirisamy <[EMAIL PROTECTED]
> >
> > > Sent: Thursday, August 8, 2013 8:40 PM
> > > Subject: Re: Client Get vs Coprocessor scan performance
> > >
> > >
> > > Can you give us a bit more information ?
> > >
> > > How do you deliver the 55 rowkeys to your endpoint ?
> > > How many regions do you have for this table ?
> > >
> > > What HBase version are you using ?
> > >
> > > Thanks
> > >
> > > On Thu, Aug 8, 2013 at 6:43 PM, Kiru Pakkirisamy
> > > <[EMAIL PROTECTED]>wrote:
> > >
> > >> Hi,
> > >> I am finding an odd behavior with the Coprocessor performance lagging
> a
> > >> client side Get.
> > >> I have a table with 500000 rows. Each have variable # of columns in
> one
> > >> column family (in this case about 600000 columns in total are
> processed)
> > >> When I try to get specific 55 rows, the client side completes in
> > half-the
> > >> time as the coprocessor endpoint.
> > >> I am using  55 RowFilters on the Coprocessor scan side. The rows are
> > >> processed are exactly the same way in both the cases.
> > >> Any pointers on how to debug this scenario ?
> > >>
> > >> Regards,
> > >> - kiru
> > >>
> > >>
> > >> Kiru Pakkirisamy | webcloudtech.wordpress.com
> >
>