Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Client Get vs Coprocessor scan performance


+
Kiru Pakkirisamy 2013-08-09, 01:43
+
Ted Yu 2013-08-09, 03:40
+
Kiru Pakkirisamy 2013-08-09, 05:21
+
Wukang Lin 2013-08-09, 06:00
+
Kiru Pakkirisamy 2013-08-09, 07:05
+
Ted Yu 2013-08-09, 05:44
+
Asaf Mesika 2013-08-17, 21:21
+
Ted Yu 2013-08-17, 23:19
+
Kiru Pakkirisamy 2013-08-18, 05:34
+
Ted Yu 2013-08-18, 13:39
+
Kiru Pakkirisamy 2013-08-18, 18:59
+
James Taylor 2013-08-18, 18:44
+
Kiru Pakkirisamy 2013-08-18, 19:16
+
James Taylor 2013-08-18, 21:07
+
Kiru Pakkirisamy 2013-08-18, 21:16
+
James Taylor 2013-08-19, 00:34
Copy link to this message
-
Re: Client Get vs Coprocessor scan performance
James,
I have only one family -cp. Yes, that is how I store the Double. No, the doubles are always positive.
The keys are "A14568 " Less than a million and I added the alphabets to randomize them.
I group them based on the C_ suffix and say order them by the Double (to simplify it).
Is there a way  to do a sort of "user defined function" on a column  ? that would take care of my calculation on that double. 
Thanks again.
 
Regards,
- kiru
Kiru Pakkirisamy | webcloudtech.wordpress.com
________________________________
 From: James Taylor <[EMAIL PROTECTED]>
To: Kiru Pakkirisamy <[EMAIL PROTECTED]>
Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Sunday, August 18, 2013 5:34 PM
Subject: Re: Client Get vs Coprocessor scan performance
 

Kiru,
What's your column family name? Just to confirm, the column qualifier of
your key value is C_10345 and this stores a value as a Double using
Bytes.toBytes(double)? Are any of the Double values negative? Any other key
values?

Can you give me an idea of the kind of fuzzy filtering you're doing on the
7 char row key? We may want to model that as a set of row key columns in
Phoenix to leverage the skip scan more.

How about I model your aggregation as an AVG over a group of rows? What
would your GROUP BY expression look like? Are you grouping based on a part
of the 7 char row key? Or on some other key value?

Thanks,
James
On Sun, Aug 18, 2013 at 2:16 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]
> wrote:

> James,
> Rowkey - String - len - 7
> Col = String - variable length - but looks C_10345
> Col value = Double
>
> If I can create a Phoenix schema mapping to this existing table that would
> be great. I actually do a group by the column values and return another
> value which is a function of the value and an input double value. Input is
> a Map<String, Double> and return is also a Map<String, Double>.
>
>
> Regards,
> - kiru
>
>
> Kiru Pakkirisamy | webcloudtech.wordpress.com
>
>   ------------------------------
>  *From:* James Taylor <[EMAIL PROTECTED]>
> *To:* [EMAIL PROTECTED]; Kiru Pakkirisamy <[EMAIL PROTECTED]>
> *Sent:* Sunday, August 18, 2013 2:07 PM
>
> *Subject:* Re: Client Get vs Coprocessor scan performance
>
> Kiru,
> If you're able to post the key values, row key structure, and data types
> you're using, I can post the Phoenix code to query against it. You're doing
> some kind of aggregation too, right? If you could explain that part too,
> that would be helpful. It's likely that you can just query the existing
> HBase data you've already created on the same cluster you're already using
> (provided you put the phoenix jar on all the region servers - use our 2.0.0
> version that just came out). Might be interesting to compare the amount of
> code necessary in each approach as well.
> Thanks,
> James
>
>
> On Sun, Aug 18, 2013 at 12:16 PM, Kiru Pakkirisamy <
> [EMAIL PROTECTED]> wrote:
>
> James,
> I am using the FuzzyRowFilter or the Gets within  a Coprocessor. Looks
> like I cannot use your SkipScanFilter by itself as it has lots of phoenix
> imports. I thought of writing my own Custom filter and saw that the
> FuzzyRowFilter in the 0.94 branch also had an implementation for
> getNextKeyHint(),  only that it works well only with fixed length keys if I
> wanted a complete match of the keys. After my padding my keys to fixed
> length it seems to be fine.
> Once I confirm some key locality and other issues (like heap), I will try
> to bench mark this table alone against Phoenix on another cluster. Thanks.
>
> Regards,
> - kiru
>
>
> Kiru Pakkirisamy | webcloudtech.wordpress.com
>
>
> ________________________________
>  From: James Taylor <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Cc: Kiru Pakkirisamy <[EMAIL PROTECTED]>
> Sent: Sunday, August 18, 2013 11:44 AM
> Subject: Re: Client Get vs Coprocessor scan performance
>
>
> Would be interesting to compare against Phoenix's Skip Scan
+
James Taylor 2013-08-19, 15:34
+
Kiru Pakkirisamy 2013-08-09, 05:58
+
Kiru Pakkirisamy 2013-08-09, 20:04
+
Kiru Pakkirisamy 2013-08-11, 06:15
+
James Taylor 2013-08-12, 16:41
+
Kiru Pakkirisamy 2013-08-12, 18:27