Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Client Get vs Coprocessor scan performance


+
Kiru Pakkirisamy 2013-08-09, 01:43
+
Ted Yu 2013-08-09, 03:40
+
Kiru Pakkirisamy 2013-08-09, 05:21
+
Wukang Lin 2013-08-09, 06:00
+
Kiru Pakkirisamy 2013-08-09, 07:05
+
Ted Yu 2013-08-09, 05:44
+
Asaf Mesika 2013-08-17, 21:21
+
Ted Yu 2013-08-17, 23:19
+
Kiru Pakkirisamy 2013-08-18, 05:34
+
Ted Yu 2013-08-18, 13:39
+
Kiru Pakkirisamy 2013-08-18, 18:59
+
James Taylor 2013-08-18, 18:44
+
Kiru Pakkirisamy 2013-08-18, 19:16
+
James Taylor 2013-08-18, 21:07
+
Kiru Pakkirisamy 2013-08-18, 21:16
+
James Taylor 2013-08-19, 00:34
+
Kiru Pakkirisamy 2013-08-19, 08:36
+
James Taylor 2013-08-19, 15:34
+
Kiru Pakkirisamy 2013-08-09, 05:58
+
Kiru Pakkirisamy 2013-08-09, 20:04
+
Kiru Pakkirisamy 2013-08-11, 06:15
Copy link to this message
-
Re: Client Get vs Coprocessor scan performance
Hey Kiru,
Another option for you may be to use Phoenix (
https://github.com/forcedotcom/phoenix). In particular, our skip scan may
be what you're looking for:
http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html.
Under-the-covers, the skip scan is doing a series of parallel scans taking
advantage of both coprocessors and the SEEK_NEXT_USING_HINT. As you can
see, it's more than 2x faster than the batched get approach. On top of
that, your queries do not only have to be doing point gets, but range scans
leverage it as well.
Thanks,
James
@JamesPlusPlus
On Sat, Aug 10, 2013 at 11:15 PM, Kiru Pakkirisamy <
[EMAIL PROTECTED]> wrote:

> Maybe I spoke too soon. HBASE-6870 fixes the table scan (as verified by
> metrics of read requests on the region).
> But the performance with RowFilter is very bad (actually worse than a full
> table scan, dont know how this can happen).API
> I hope my API usage is right. All I am doing is add RowFilters to
> FilterList and setFilter on the scan.
> I tried looking into the AggregateImplementation  (which is mentioned as
> unit test for this bug)  but did not follow through because I am in a rush
> for a good workaround.
> I have now replaced RowFilters with a Get on the Region (in a loop) after
> making sure my key is within startKey and endKey of the region.
> I think this is getting my data right. Performance is very good, almost
> half that of the full scan code we had in the coprocessor earlier.
> Are there any gotchas/bad side-effects to using a Get on the Region ?
>
> Regards,
> - kiru
>
>
> Kiru Pakkirisamy | webcloudtech.wordpress.com
>
>
> ________________________________
>  From: Kiru Pakkirisamy <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Sent: Friday, August 9, 2013 1:04 PM
> Subject: Re: Client Get vs Coprocessor scan performance
>
>
> I think this fixes my issues. On our dev cluster what used to take 1200
> msec is now in the 700-800 msec region. Thanks again.
> I will be soon deploying this to our Performance cluster where our query
> is at 15 secs range.
>
> Regards,
> - kiru
>
>
> Kiru Pakkirisamy | webcloudtech.wordpress.com
>
>
> ________________________________
> From: Ted Yu <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Sent: Thursday, August 8, 2013 10:44 PM
> Subject: Re: Client Get vs Coprocessor scan performance
>
>
> I think you need HBASE-6870 which went into 0.94.8
>
> Upgrading should boost coprocessor performance.
>
> Cheers
>
> On Aug 8, 2013, at 10:21 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]>
> wrote:
>
> > Ted,
> > Here is the method signature/protocol
> > public Map<String, Double> getFooMap<String, Double> input,
> > int topN) throws IOException;
> >
> > There are 31 regions on 4 nodes X 8 CPU.
> > I am on 0.94.6 (from Hortonworks).
> > I think it seems to behave like what linwukang says, - it is almost a
> full table scan in the coprocessor.
> > Actually, when I set more specific ColumnPrefixFilters performance went
> down.
> > I want to do things on the server side because, I dont want to be
> sending 500K column/values to the client.
> > I cannot believe a single-threaded client which does some calculations
> and group-by  beats the coprocessor running in 31 regions.
> >
> > Regards,
> > - kiru
> >
> >
> > Kiru Pakkirisamy | webcloudtech.wordpress.com
> >
> >
> > ________________________________
> > From: Ted Yu <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]; Kiru Pakkirisamy <[EMAIL PROTECTED]>
> > Sent: Thursday, August 8, 2013 8:40 PM
> > Subject: Re: Client Get vs Coprocessor scan performance
> >
> >
> > Can you give us a bit more information ?
> >
> > How do you deliver the 55 rowkeys to your endpoint ?
> > How many regions do you have for this table ?
> >
> > What HBase version are you using ?
> >
> > Thanks
> >
> > On Thu, Aug 8, 2013 at 6:43 PM, Kiru Pakkirisamy
> > <[EMAIL PROTECTED]>wrote:
+
Kiru Pakkirisamy 2013-08-12, 18:27
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB