Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Coprocessor Aggregation supposed to be ~20x slower than Scans?


Copy link to this message
-
Re: Coprocessor Aggregation supposed to be ~20x slower than Scans?
anil gupta 2012-05-15, 17:34
Hi Ted,

I created the jira:https://issues.apache.org/jira/browse/HBASE-5999 for
fixing this.

Creating the patch might take me sometime(due to learning curve) as this is
the first time i would be creating a patch.

Thanks,
Anil Gupta
On Mon, May 14, 2012 at 4:00 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> I was aware of the following change.
>
> Can you log a JIRA and attach the patch to it ?
>
> Thanks for trying out and improving aggregation client.
>
> On Mon, May 14, 2012 at 3:31 PM, anil gupta <[EMAIL PROTECTED]> wrote:
>
> > Hi Ted,
> >
> > If we change the if statement condition in validateParameters method in
> > AggregationClient.java to:
> > if (scan == null || (Bytes.equals(scan.getStartRow(), scan.getStopRow())
> &&
> > !Bytes.equals(scan.getStartRow(), HConstants.EMPTY_START_ROW)) ||
> > (Bytes.compareTo(scan.getStartRow(), scan.getStopRow()) > 0 &&
> > *!Bytes.equals(scan.getStopRow(),
> > HConstants.EMPTY_END_ROW)* ))
> >
> > Condition specified in the bold and Italic will handle the case when the
> > stopRow is not specified. IMHO, it's not an error if we are not
> specifying
> > the stopRow. This is what is was looking for because in my case i didnt
> > wanted to set the stop row as I am using a prefix filter. I have tested
> the
> > above specified code and it works fine when i only specify the startRow.
> Is
> > this a desirable functionality? If yes, should this be added to trunk?
> >
> > Here is the link for source of AggregationClient:
> >
> >
> http://grepcode.com/file_/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.0/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java/?v=source
> >
> > Thanks,
> > Anil Gupta
> >
> >
> > On Mon, May 14, 2012 at 1:58 PM, Ted Yu <[EMAIL PROTECTED]> wrote
> >
> > > Anil:
> > > As code #3 shows, having stopRow helps narrow the range of rows
> > > participating in aggregation.
> > >
> > > Do you have suggestion on how this process can be made more
> > user-friendly ?
> > >
> > > Thanks
> > >
> > > On Mon, May 14, 2012 at 1:47 PM, anil gupta <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > HI Ted,
> > > >
> > > > My bad, i missed out a big difference between the Scan object i am
> > using
> > > in
> > > > my filter and Scan object used in coprocessors. So, scan object is
> not
> > > > same.
> > > > Basically, i am doing filtering on the basis of a prefix of RowKey.
> > > >
> > > > So, in my filter i do this to build scanner:
> > > > Code 1:
> > > >  Filter filter = new PrefixFilter(Bytes.toBytes(strPrefix));
> > > >            Scan scan = new Scan();
> > > >            scan.setFilter(filter);
> > > >            scan.setStartRow(Bytes.toBytes(strPrefix)); // I dont set
> > any
> > > > stopRow in this scanner.
> > > >
> > > > In coprocessor, i do the following for scanner:
> > > > Code 2:
> > > >  Scan scan = new Scan();
> > > > scan.setFilter(new PrefixFilter(Bytes.toBytes(prefix)));
> > > >
> > > >  I dont have startRow in above code because if i only use only the
> > > startRow
> > > > in coprocessor scanner then i get the following exception(due to
> this I
> > > > removed the startRow from CP scan object code):
> > > > java.io.IOException: Agg client Exception: Startrow should be smaller
> > > than
> > > > Stoprow
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.coprocessor.AggregationClient.validateParameters(AggregationClient.java:116)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.coprocessor.AggregationClient.max(AggregationClient.java:85)
> > > >    at
> > > >
> com.intuit.ihub.hbase.poc.DummyClass.doAggregation(DummyClass.java:81)
> > > >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > >    at
> > > >
> > > >
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > > >
> > > >
> > > > I modified the above code#2 to add the stopRow also:
> > > > Code 3:
> > > > Scan scan = new Scan();
> > > >        scan.setStartRow(Bytes.toBytes(prefix));

Thanks & Regards,
Anil Gupta