Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Client Get vs Coprocessor scan performance


Copy link to this message
-
Re: Client Get vs Coprocessor scan performance
Kiru,
Is the column qualifier for the key value storing the double different
for different rows? Not sure I understand what you're grouping over.
Maybe  5 rows worth of sample input and expected output would help.
Thanks,
James
On Aug 19, 2013, at 1:37 AM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:

> James,
> I have only one family -cp. Yes, that is how I store the Double. No, the doubles are always positive.
> The keys are "A14568 " Less than a million and I added the alphabets to randomize them.
> I group them based on the C_ suffix and say order them by the Double (to simplify it).
> Is there a way  to do a sort of "user defined function" on a column  ? that would take care of my calculation on that double.
> Thanks again.
>
> Regards,
> - kiru
>
>
> Kiru Pakkirisamy | webcloudtech.wordpress.com
>
>
> ________________________________
> From: James Taylor <[EMAIL PROTECTED]>
> To: Kiru Pakkirisamy <[EMAIL PROTECTED]>
> Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Sent: Sunday, August 18, 2013 5:34 PM
> Subject: Re: Client Get vs Coprocessor scan performance
>
>
> Kiru,
> What's your column family name? Just to confirm, the column qualifier of
> your key value is C_10345 and this stores a value as a Double using
> Bytes.toBytes(double)? Are any of the Double values negative? Any other key
> values?
>
> Can you give me an idea of the kind of fuzzy filtering you're doing on the
> 7 char row key? We may want to model that as a set of row key columns in
> Phoenix to leverage the skip scan more.
>
> How about I model your aggregation as an AVG over a group of rows? What
> would your GROUP BY expression look like? Are you grouping based on a part
> of the 7 char row key? Or on some other key value?
>
> Thanks,
> James
>
>
> On Sun, Aug 18, 2013 at 2:16 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]
>> wrote:
>
>> James,
>> Rowkey - String - len - 7
>> Col = String - variable length - but looks C_10345
>> Col value = Double
>>
>> If I can create a Phoenix schema mapping to this existing table that would
>> be great. I actually do a group by the column values and return another
>> value which is a function of the value and an input double value. Input is
>> a Map<String, Double> and return is also a Map<String, Double>.
>>
>>
>> Regards,
>> - kiru
>>
>>
>> Kiru Pakkirisamy | webcloudtech.wordpress.com
>>
>>    ------------------------------
>>   *From:* James Taylor <[EMAIL PROTECTED]>
>> *To:* [EMAIL PROTECTED]; Kiru Pakkirisamy <[EMAIL PROTECTED]>
>> *Sent:* Sunday, August 18, 2013 2:07 PM
>>
>> *Subject:* Re: Client Get vs Coprocessor scan performance
>>
>> Kiru,
>> If you're able to post the key values, row key structure, and data types
>> you're using, I can post the Phoenix code to query against it. You're doing
>> some kind of aggregation too, right? If you could explain that part too,
>> that would be helpful. It's likely that you can just query the existing
>> HBase data you've already created on the same cluster you're already using
>> (provided you put the phoenix jar on all the region servers - use our 2.0.0
>> version that just came out). Might be interesting to compare the amount of
>> code necessary in each approach as well.
>> Thanks,
>> James
>>
>>
>> On Sun, Aug 18, 2013 at 12:16 PM, Kiru Pakkirisamy <
>> [EMAIL PROTECTED]> wrote:
>>
>> James,
>> I am using the FuzzyRowFilter or the Gets within  a Coprocessor. Looks
>> like I cannot use your SkipScanFilter by itself as it has lots of phoenix
>> imports. I thought of writing my own Custom filter and saw that the
>> FuzzyRowFilter in the 0.94 branch also had an implementation for
>> getNextKeyHint(),  only that it works well only with fixed length keys if I
>> wanted a complete match of the keys. After my padding my keys to fixed
>> length it seems to be fine.
>> Once I confirm some key locality and other issues (like heap), I will try
>> to bench mark this table alone against Phoenix on another cluster. Thanks.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB