Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Coprocessor Aggregation supposed to be ~20x slower than Scans?


+
anil gupta 2012-05-14, 19:02
+
Stack 2012-05-14, 19:08
+
anil gupta 2012-05-14, 19:31
+
Ted Yu 2012-05-14, 19:55
Copy link to this message
-
Re: Coprocessor Aggregation supposed to be ~20x slower than Scans?
HI Ted,

My bad, i missed out a big difference between the Scan object i am using in
my filter and Scan object used in coprocessors. So, scan object is not same.
Basically, i am doing filtering on the basis of a prefix of RowKey.

So, in my filter i do this to build scanner:
Code 1:
 Filter filter = new PrefixFilter(Bytes.toBytes(strPrefix));
            Scan scan = new Scan();
            scan.setFilter(filter);
            scan.setStartRow(Bytes.toBytes(strPrefix)); // I dont set any
stopRow in this scanner.

In coprocessor, i do the following for scanner:
Code 2:
 Scan scan = new Scan();
scan.setFilter(new PrefixFilter(Bytes.toBytes(prefix)));

 I dont have startRow in above code because if i only use only the startRow
in coprocessor scanner then i get the following exception(due to this I
removed the startRow from CP scan object code):
java.io.IOException: Agg client Exception: Startrow should be smaller than
Stoprow
    at
org.apache.hadoop.hbase.client.coprocessor.AggregationClient.validateParameters(AggregationClient.java:116)
    at
org.apache.hadoop.hbase.client.coprocessor.AggregationClient.max(AggregationClient.java:85)
    at
com.intuit.ihub.hbase.poc.DummyClass.doAggregation(DummyClass.java:81)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
I modified the above code#2 to add the stopRow also:
Code 3:
Scan scan = new Scan();
        scan.setStartRow(Bytes.toBytes(prefix));

scan.setStopRow(Bytes.toBytes(String.valueOf(Long.parseLong(prefix)+1)));
        scan.setFilter(new PrefixFilter(Bytes.toBytes(prefix)));

When, i run the coprocessor with Code #3, its blazing fast. I gives the
result in around 200 millisecond. :)
Since, this was just testing a coprocessors i added the logic to add the
stopRow manually. What is the reason that Scan object in coprocessor always
requires stopRow along with startRow?(code #1 works fine even when i dont
use stopRow)  Can this restriction be relaxed?

Thanks,
Anil Gupta

On Mon, May 14, 2012 at 12:55 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Anil:
> I think the performance was related to your custom filter.
>
> Please tell us more about the filter next time.
>
> Thanks
>
> On Mon, May 14, 2012 at 12:31 PM, anil gupta <[EMAIL PROTECTED]>
> wrote:
>
> > HI Stack,
> >
> > I'll look into Gary Helming post and try to do profiling of coprocessor
> and
> > share the results.
> >
> > Thanks,
> > Anil Gupta
> >
> > On Mon, May 14, 2012 at 12:08 PM, Stack <[EMAIL PROTECTED]> wrote:
> >
> > > On Mon, May 14, 2012 at 12:02 PM, anil gupta <[EMAIL PROTECTED]>
> > > wrote:
> > > > I loaded around 70 thousand 1-2KB records in HBase. For scans, with
> my
> > > > custom filter i am able to get 97 rows in 500 milliseconds and for
> > doing
> > > > sum, max, min(in built aggregations of HBase) on the same custom
> filter
> > > its
> > > > taking 11000 milliseconds. Does this mean that coprocessors
> aggregation
> > > is
> > > > supposed to be around ~20x slower than scans? Am i missing any trick
> > over
> > > > here?
> > > >
> > >
> > > That seems like a high tax to pay for running CPs.  Can you dig in on
> > > where the time is being spent?  (See another recent note on this list
> > > or on dev where Gary Helmling talks about how he did basic profiling
> > > of CPs).
> > > St.Ack
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>

--
Thanks & Regards,
Anil Gupta
+
Ted Yu 2012-05-14, 20:58
+
anil gupta 2012-05-14, 22:31
+
Ted Yu 2012-05-14, 23:00
+
anil gupta 2012-05-15, 17:34
+
Ted Yu 2012-05-15, 17:47
+
Ted Yu 2012-05-15, 18:46
+
anil gupta 2012-05-15, 19:09
+
Ted Yu 2012-05-15, 20:37
+
anil gupta 2012-05-15, 23:58
+
Ted Yu 2012-05-16, 00:07
+
anil gupta 2012-05-16, 00:30
+
Ted Yu 2012-05-16, 00:34
+
Jimmy Xiang 2012-05-16, 17:28
+
Anil Gupta 2012-05-16, 18:15
+
anil gupta 2012-05-15, 18:04
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB