Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Re: Coprocessors


+
Sudarshan Kadambi 2013-04-25, 21:57
+
lars hofhansl 2013-04-25, 22:06
+
Sudarshan Kadambi 2013-04-25, 21:44
+
lars hofhansl 2013-04-25, 21:54
+
Michael Segel 2013-04-25, 22:12
+
Viral Bajaria 2013-04-25, 22:28
+
Gary Helmling 2013-04-25, 22:35
+
James Taylor 2013-04-25, 22:44
+
Sudarshan Kadambi 2013-04-25, 22:36
Copy link to this message
-
Re: Coprocessors
Hi,

Lets reiterate what you've said....

You have a set of objects <O1, O2..... On> and you have some field type <F1> where F1 which is part of your composite key. You want to fetch back a set of rows and then do some aggregation on the attributes.
There was a similar discussion on this where someone had a random set of values and was having performance issues.

If your set of objects is in sort order and you have only one field type <F1> you should be able to do the multi-gets.

Are you currently using the multigets ?

On Apr 25, 2013, at 5:36 PM, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) <[EMAIL PROTECTED]> wrote:

> Michael: Fair enough. Let me see what relevant information I can add to what I've already said:
>
> 1. To Lars' point, my 250K keys are unlikely to fall into fewer than 250K sub-ranges.
> 2. Here's a bit more about my schema:
> 2.1 My rowkeys are composed of 2 entities - let's call it object-id and field-type. An object (O1) has 100s of field types (F1,F2,F3...). Each object-id - field-type pair has 100s of attributes (A1,A2,A3).
> 2.2 My rowkeys are O1-F1, O1-F2, O1-F3, etc.
> 2.3 My primary application (not the one my original post was about) accesses by these rowkeys.
> 2.4 My application that does aggregation is given a bunch of objects <O1, O2, O3>, a field-type <F1>, a bunch of attributes <A1,A2> and some computation to perform.
> 2.5 As you can see, scans are unlikely to be useful when fetching O1-F1, O2-F1, O3-F1 etc.
>
> Viral: How do I tackle aggregation using observers? Let's say I override the postGet method. I do a multi-get from my client and my method gets called on each region server for each row. What is the next step with this approach?
>
>
> ----- Original Message -----
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED], [EMAIL PROTECTED]
> Cc: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN)
> At: Apr 25 2013 18:12:46
>
> I don't think Phoenix will solve his problem.
>
> He also needs to explain more about his problem before we can start to think about the problem.
>
>
> On Apr 25, 2013, at 4:54 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
>> You might want to have a look at Phoenix (https://github.com/forcedotcom/phoenix), which does that and more, and gives a SQL/JDBC interface.
>>
>> -- Lars
>>
>>
>>
>> ________________________________
>> From: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> Sent: Thursday, April 25, 2013 2:44 PM
>> Subject: Coprocessors
>>
>>
>> Folks:
>>
>> This is my first post on the HBase user mailing list.
>>
>> I have the following scenario:
>> I've a HBase table of upto a billion keys. I'm looking to support an application where on some user action, I'd need to fetch multiple columns for upto 250K keys and do some sort of aggregation on it. Fetching all that data and doing the aggregation in my application takes about a minute.
>>
>> I'm looking to co-locate the aggregation logic with the region servers to
>> a. Distribute the aggregation
>> b. Avoid having to fetch large amounts of data over the network (this could potentially be cross-datacenter)
>>
>> Neither observers nor aggregation endpoints work for this use case. Observers don't return data back to the client while aggregation endpoints work in the context of scans not a multi-get (Are these correct assumptions?).
>>
>> I'm looking to write a service that runs alongside the region servers and acts a proxy b/w my application and the region servers.
>>
>> I plan to use the logic in HBase client's HConnectionManager, to segment my request of 1M rowkeys into sub-requests per region-server. These are sent over to the proxy which fetches the data from the region server, aggregates locally and sends data back. Does this sound reasonable or even a useful thing to pursue?
>>
>> Regards,
>> -sudarshan
+
James Taylor 2013-04-25, 23:00
+
Sudarshan Kadambi 2013-04-25, 23:19
+
James Taylor 2013-04-25, 23:51
+
James Taylor 2013-05-02, 00:01
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB