Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Client Get vs Coprocessor scan performance


+
Kiru Pakkirisamy 2013-08-09, 01:43
+
Ted Yu 2013-08-09, 03:40
+
Kiru Pakkirisamy 2013-08-09, 05:21
+
Wukang Lin 2013-08-09, 06:00
+
Kiru Pakkirisamy 2013-08-09, 07:05
+
Ted Yu 2013-08-09, 05:44
+
Asaf Mesika 2013-08-17, 21:21
+
Ted Yu 2013-08-17, 23:19
+
Kiru Pakkirisamy 2013-08-18, 05:34
+
Ted Yu 2013-08-18, 13:39
+
Kiru Pakkirisamy 2013-08-18, 18:59
+
James Taylor 2013-08-18, 18:44
+
Kiru Pakkirisamy 2013-08-18, 19:16
Copy link to this message
-
Re: Client Get vs Coprocessor scan performance
Kiru,
If you're able to post the key values, row key structure, and data types
you're using, I can post the Phoenix code to query against it. You're doing
some kind of aggregation too, right? If you could explain that part too,
that would be helpful. It's likely that you can just query the existing
HBase data you've already created on the same cluster you're already using
(provided you put the phoenix jar on all the region servers - use our 2.0.0
version that just came out). Might be interesting to compare the amount of
code necessary in each approach as well.
Thanks,
James
On Sun, Aug 18, 2013 at 12:16 PM, Kiru Pakkirisamy <
[EMAIL PROTECTED]> wrote:

> James,
> I am using the FuzzyRowFilter or the Gets within  a Coprocessor. Looks
> like I cannot use your SkipScanFilter by itself as it has lots of phoenix
> imports. I thought of writing my own Custom filter and saw that the
> FuzzyRowFilter in the 0.94 branch also had an implementation for
> getNextKeyHint(),  only that it works well only with fixed length keys if I
> wanted a complete match of the keys. After my padding my keys to fixed
> length it seems to be fine.
> Once I confirm some key locality and other issues (like heap), I will try
> to bench mark this table alone against Phoenix on another cluster. Thanks.
>
> Regards,
> - kiru
>
>
> Kiru Pakkirisamy | webcloudtech.wordpress.com
>
>
> ________________________________
>  From: James Taylor <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Cc: Kiru Pakkirisamy <[EMAIL PROTECTED]>
> Sent: Sunday, August 18, 2013 11:44 AM
> Subject: Re: Client Get vs Coprocessor scan performance
>
>
> Would be interesting to compare against Phoenix's Skip Scan
> (
> http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html
> )
> which does a scan through a coprocessor and is more than 2x faster
> than multi Get (plus handles multi-range scans in addition to point
> gets).
>
> James
>
> On Aug 18, 2013, at 6:39 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > bq. Get'ting 100 rows seems to be faster than the FuzzyRowFilter (mask on
> > the whole length of the key)
> >
> > In this case the Get's are very selective. The number of rows
> FuzzyRowFilter
> > was evaluated against would be much higher.
> > It would be nice if you remember the time each took.
> >
> > bq. Also, I am seeing very bad concurrent query performance
> >
> > Were the multi Get's performed by your coprocessor within region boundary
> > of the respective coprocessor ? Just to confirm.
> >
> > bq. that would make Coprocessors almost single threaded across multiple
> > invocations ?
> >
> > Let me dig into code some more.
> >
> > Cheers
> >
> >
> > On Sat, Aug 17, 2013 at 10:34 PM, Kiru Pakkirisamy <
> > [EMAIL PROTECTED]> wrote:
> >
> >> Ted,
> >> On a table with 600K rows, Get'ting 100 rows seems to be faster than the
> >> FuzzyRowFilter (mask on the whole length of the key). I thought the
> >> FuzzyRowFilter's  SEEK_NEXT_USING_HINT would help.  All this on the
> client
> >> side, I have not changed my CoProcessor to use the FuzzyRowFilter based
> on
> >> the client side performance (still doing multiple get inside the
> >> coprocessor). Also, I am seeing very bad concurrent query performance.
> Are
> >> there any thing that would make Coprocessors almost single threaded
> across
> >> multiple invocations ?
> >> Again, all this after putting in 0.94.10 (for hbase-6870 sake) which
> seems
> >> to be very good in bringing up the regions online fast and balanced.
> Thanks
> >> and much appreciated.
> >>
> >> Regards,
> >> - kiru
> >>
> >>
> >> Kiru Pakkirisamy | webcloudtech.wordpress.com
> >>
> >>
> >> ________________________________
> >> From: Ted Yu <[EMAIL PROTECTED]>
> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> >> Sent: Saturday, August 17, 2013 4:19 PM
> >> Subject: Re: Client Get vs Coprocessor scan performance
> >>
> >>
> >> HBASE-6870 targeted whole table scanning for each coprocessorService
+
Kiru Pakkirisamy 2013-08-18, 21:16
+
James Taylor 2013-08-19, 00:34
+
Kiru Pakkirisamy 2013-08-19, 08:36
+
James Taylor 2013-08-19, 15:34
+
Kiru Pakkirisamy 2013-08-09, 05:58
+
Kiru Pakkirisamy 2013-08-09, 20:04
+
Kiru Pakkirisamy 2013-08-11, 06:15
+
James Taylor 2013-08-12, 16:41
+
Kiru Pakkirisamy 2013-08-12, 18:27
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB