Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Client Get vs Coprocessor scan performance


Copy link to this message
-
Re: Client Get vs Coprocessor scan performance
Asaf Mesika 2013-08-17, 21:21
Ted, can you elaborate a little bit why this issue boosts performance?
I couldn't figure out from the issue comments if they execCoprocessor scans
the entire .META. table or and entire table, to understand the actual
improvement.

Thanks!
On Fri, Aug 9, 2013 at 8:44 AM, Ted Yu <[EMAIL PROTECTED]> wrote:

> I think you need HBASE-6870 which went into 0.94.8
>
> Upgrading should boost coprocessor performance.
>
> Cheers
>
> On Aug 8, 2013, at 10:21 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]>
> wrote:
>
> > Ted,
> > Here is the method signature/protocol
> > public Map<String, Double> getFooMap<String, Double> input,
> > int topN) throws IOException;
> >
> > There are 31 regions on 4 nodes X 8 CPU.
> > I am on 0.94.6 (from Hortonworks).
> > I think it seems to behave like what linwukang says, - it is almost a
> full table scan in the coprocessor.
> > Actually, when I set more specific ColumnPrefixFilters performance went
> down.
> > I want to do things on the server side because, I dont want to be
> sending 500K column/values to the client.
> > I cannot believe a single-threaded client which does some calculations
> and group-by  beats the coprocessor running in 31 regions.
> >
> > Regards,
> > - kiru
> >
> >
> > Kiru Pakkirisamy | webcloudtech.wordpress.com
> >
> >
> > ________________________________
> > From: Ted Yu <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]; Kiru Pakkirisamy <[EMAIL PROTECTED]>
> > Sent: Thursday, August 8, 2013 8:40 PM
> > Subject: Re: Client Get vs Coprocessor scan performance
> >
> >
> > Can you give us a bit more information ?
> >
> > How do you deliver the 55 rowkeys to your endpoint ?
> > How many regions do you have for this table ?
> >
> > What HBase version are you using ?
> >
> > Thanks
> >
> > On Thu, Aug 8, 2013 at 6:43 PM, Kiru Pakkirisamy
> > <[EMAIL PROTECTED]>wrote:
> >
> >> Hi,
> >> I am finding an odd behavior with the Coprocessor performance lagging a
> >> client side Get.
> >> I have a table with 500000 rows. Each have variable # of columns in one
> >> column family (in this case about 600000 columns in total are processed)
> >> When I try to get specific 55 rows, the client side completes in
> half-the
> >> time as the coprocessor endpoint.
> >> I am using  55 RowFilters on the Coprocessor scan side. The rows are
> >> processed are exactly the same way in both the cases.
> >> Any pointers on how to debug this scenario ?
> >>
> >> Regards,
> >> - kiru
> >>
> >>
> >> Kiru Pakkirisamy | webcloudtech.wordpress.com
>