Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Results from a Map/Reduce


Copy link to this message
-
RE: Results from a Map/Reduce
Jonathan Gray 2010-12-17, 21:19
If there's a customer waiting for the query, then you wouldn't want to have them what for an MR job.

So what you're saying is you want to change this from on-demand scans to using MapReduce to aggregate roll-ups ahead of time and serve those?

In that case, your MR job doesn't need one final output, right?  You could do the Map over the entire table (or start/stop rows depending on schema) and with the appropriate filters.  You would output (customerid + hour bucket) as the key and 1 for the value.  You'd get a reduce for each customerid/hour bucket and would write that to HBase.

One of the ideas behind coprocessors is you could do the per-customer scan/filter/aggregate as a parallel operation inside the RSs (without the overhead of MR or cross-JVM) and might be able to increase the number of rows you can process within a reasonable amount of time.

Another approach to these kinds of aggregates, if you care about realtime at some level, is to use HBase's increment capabilities and a similar hour-bucketed schema but updated on demand instead of in batch.

Yeah, this is a "basic" operation but that only means there are 100 ways to implement it :)

JG

> -----Original Message-----
> From: Peter Haidinyak [mailto:[EMAIL PROTECTED]]
> Sent: Friday, December 17, 2010 12:13 PM
> To: [EMAIL PROTECTED]
> Subject: RE: Results from a Map/Reduce
>
> What I have is basically a query on a log table to return the number of hits per
> hour for customer X for Y days and having the ability to filtering on columns,
> these are to be displayed in a web page on demand.
> Currently, using a Scan, with a popular customer I can get back millions of
> rows to aggregate into 'Hits per hour' buckets. I wanted to push the
> aggregation back to a Map/Reduce and then have those results available to
> send back as a web page.
> This seems like such a basic operation that I am hoping there are 'Best
> Practices' or examples on how to accomplish this. I would also like a pony too.
> :-)
>
> Thanks
>
> -Pete
>
> -----Original Message-----
> From: Jonathan Gray [mailto:[EMAIL PROTECTED]]
> Sent: Friday, December 17, 2010 12:01 PM
> To: [EMAIL PROTECTED]
> Subject: RE: Results from a Map/Reduce
>
> There's not much in the way of examples for coprocessors besides the
> implementation of Security.  Check out HBASE-2000 and go from there.  If
> you're fairly new to HBase, then wait a couple months and there should be
> much better support around Coprocessors.
>
> I'm unsure of a way to have a final result returned back to the main()
> method.  What exactly are you trying to do with this result?  Available to you
> to do what with it?  Could the MR job put the result back into HBase or could
> your reducer contain the logic you need to use with the final result?
>
> > -----Original Message-----
> > From: Peter Haidinyak [mailto:[EMAIL PROTECTED]]
> > Sent: Friday, December 17, 2010 11:56 AM
> > To: [EMAIL PROTECTED]
> > Subject: RE: Results from a Map/Reduce
> >
> > Does that mean that when the job.waitForCompletion(true) returns that
> > I have the results from the Reducer(s) available to me? I haven't seen
> > much on coprocessors, can you point me to some examples of their use?
> >
> > Thanks
> > -Pete
> >
> > -----Original Message-----
> > From: Jonathan Gray [mailto:[EMAIL PROTECTED]]
> > Sent: Friday, December 17, 2010 11:13 AM
> > To: [EMAIL PROTECTED]
> > Subject: RE: Results from a Map/Reduce
> >
> > Hey Peter,
> >
> > That System.exit line is nothing important, just the main thread
> > waiting for the tasks to finish before closing.
> >
> > You're interested in having the MR job return a single result?  To do
> > that, you would need to roll-up the processing done in each of your
> > Map tasks into a single Reduce task.  With one reducer, you can have a
> > single point to do the final aggregation of the result.
> >
> > I'm not sure exactly what kind of aggregation you are doing but
> > funneling into a single reducer can range from no problem to don't