Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Client Get vs Coprocessor scan performance


Copy link to this message
-
Re: Client Get vs Coprocessor scan performance
Kiru,
What's your column family name? Just to confirm, the column qualifier of
your key value is C_10345 and this stores a value as a Double using
Bytes.toBytes(double)? Are any of the Double values negative? Any other key
values?

Can you give me an idea of the kind of fuzzy filtering you're doing on the
7 char row key? We may want to model that as a set of row key columns in
Phoenix to leverage the skip scan more.

How about I model your aggregation as an AVG over a group of rows? What
would your GROUP BY expression look like? Are you grouping based on a part
of the 7 char row key? Or on some other key value?

Thanks,
James
On Sun, Aug 18, 2013 at 2:16 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]
> wrote:

> James,
> Rowkey - String - len - 7
> Col = String - variable length - but looks C_10345
> Col value = Double
>
> If I can create a Phoenix schema mapping to this existing table that would
> be great. I actually do a group by the column values and return another
> value which is a function of the value and an input double value. Input is
> a Map<String, Double> and return is also a Map<String, Double>.
>
>
> Regards,
> - kiru
>
>
> Kiru Pakkirisamy | webcloudtech.wordpress.com
>
>   ------------------------------
>  *From:* James Taylor <[EMAIL PROTECTED]>
> *To:* [EMAIL PROTECTED]; Kiru Pakkirisamy <[EMAIL PROTECTED]>
> *Sent:* Sunday, August 18, 2013 2:07 PM
>
> *Subject:* Re: Client Get vs Coprocessor scan performance
>
> Kiru,
> If you're able to post the key values, row key structure, and data types
> you're using, I can post the Phoenix code to query against it. You're doing
> some kind of aggregation too, right? If you could explain that part too,
> that would be helpful. It's likely that you can just query the existing
> HBase data you've already created on the same cluster you're already using
> (provided you put the phoenix jar on all the region servers - use our 2.0.0
> version that just came out). Might be interesting to compare the amount of
> code necessary in each approach as well.
> Thanks,
> James
>
>
> On Sun, Aug 18, 2013 at 12:16 PM, Kiru Pakkirisamy <
> [EMAIL PROTECTED]> wrote:
>
> James,
> I am using the FuzzyRowFilter or the Gets within  a Coprocessor. Looks
> like I cannot use your SkipScanFilter by itself as it has lots of phoenix
> imports. I thought of writing my own Custom filter and saw that the
> FuzzyRowFilter in the 0.94 branch also had an implementation for
> getNextKeyHint(),  only that it works well only with fixed length keys if I
> wanted a complete match of the keys. After my padding my keys to fixed
> length it seems to be fine.
> Once I confirm some key locality and other issues (like heap), I will try
> to bench mark this table alone against Phoenix on another cluster. Thanks.
>
> Regards,
> - kiru
>
>
> Kiru Pakkirisamy | webcloudtech.wordpress.com
>
>
> ________________________________
>  From: James Taylor <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Cc: Kiru Pakkirisamy <[EMAIL PROTECTED]>
> Sent: Sunday, August 18, 2013 11:44 AM
> Subject: Re: Client Get vs Coprocessor scan performance
>
>
> Would be interesting to compare against Phoenix's Skip Scan
> (
> http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html
> )
> which does a scan through a coprocessor and is more than 2x faster
> than multi Get (plus handles multi-range scans in addition to point
> gets).
>
> James
>
> On Aug 18, 2013, at 6:39 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > bq. Get'ting 100 rows seems to be faster than the FuzzyRowFilter (mask on
> > the whole length of the key)
> >
> > In this case the Get's are very selective. The number of rows
> FuzzyRowFilter
> > was evaluated against would be much higher.
> > It would be nice if you remember the time each took.
> >
> > bq. Also, I am seeing very bad concurrent query performance
> >
> > Were the multi Get's performed by your coprocessor within region boundary
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB