Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Coprocessors


Copy link to this message
-
Re: Coprocessors
Viral Bajaria 2013-04-25, 22:28
Phoenix might be able to solve the problem if the keys are structured in
the binary format that it understand or else you are better off reloading
that data in a table created via Phoenix. But I will let James tackle this
question.

Regarding your use-case, why can't you do the aggregation using observers ?
You should be able to do the aggregation and return a new Scanner to your
client.

And Lars is right about the range scans that Phoenix does. It does restrict
things and also will do parallel scans for you based on what you
select/filter.

-Viral
On Thu, Apr 25, 2013 at 3:12 PM, Michael Segel <[EMAIL PROTECTED]>wrote:

> I don't think Phoenix will solve his problem.
>
> He also needs to explain more about his problem before we can start to
> think about the problem.
>
>
> On Apr 25, 2013, at 4:54 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > You might want to have a look at Phoenix (
> https://github.com/forcedotcom/phoenix), which does that and more, and
> gives a SQL/JDBC interface.
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> > From: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]
> > Sent: Thursday, April 25, 2013 2:44 PM
> > Subject: Coprocessors
> >
> >
> > Folks:
> >
> > This is my first post on the HBase user mailing list.
> >
> > I have the following scenario:
> > I've a HBase table of upto a billion keys. I'm looking to support an
> application where on some user action, I'd need to fetch multiple columns
> for upto 250K keys and do some sort of aggregation on it. Fetching all that
> data and doing the aggregation in my application takes about a minute.
> >
> > I'm looking to co-locate the aggregation logic with the region servers to
> > a. Distribute the aggregation
> > b. Avoid having to fetch large amounts of data over the network (this
> could potentially be cross-datacenter)
> >
> > Neither observers nor aggregation endpoints work for this use case.
> Observers don't return data back to the client while aggregation endpoints
> work in the context of scans not a multi-get (Are these correct
> assumptions?).
> >
> > I'm looking to write a service that runs alongside the region servers
> and acts a proxy b/w my application and the region servers.
> >
> > I plan to use the logic in HBase client's HConnectionManager, to segment
> my request of 1M rowkeys into sub-requests per region-server. These are
> sent over to the proxy which fetches the data from the region server,
> aggregates locally and sends data back. Does this sound reasonable or even
> a useful thing to pursue?
> >
> > Regards,
> > -sudarshan
>
>