Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Results from a Map/Reduce


Copy link to this message
-
RE: Results from a Map/Reduce
If you do aggregation, your queries will most likely be well under a second.  The aggregates should reduce the amount of data that needs to be read by several orders of magnitude, no?

> -----Original Message-----
> From: Peter Haidinyak [mailto:[EMAIL PROTECTED]]
> Sent: Friday, December 17, 2010 1:43 PM
> To: [EMAIL PROTECTED]
> Subject: RE: Results from a Map/Reduce
>
> So the idea is to aggregate the final result to an HBase Table and then from
> the client query that table. I'm going to have to find a quicker method.
> Currently on my small three node cluster with 100million rows it takes a
> couple of minutes to do a scan that brings back several million rows. My boss
> wants the query to be in the 'less than five second' range.
>
> Thanks
>
> -Pete
>
> -----Original Message-----
> From: Jonathan Gray [mailto:[EMAIL PROTECTED]]
> Sent: Friday, December 17, 2010 1:19 PM
> To: [EMAIL PROTECTED]
> Subject: RE: Results from a Map/Reduce
>
> If there's a customer waiting for the query, then you wouldn't want to have
> them what for an MR job.
>
> So what you're saying is you want to change this from on-demand scans to
> using MapReduce to aggregate roll-ups ahead of time and serve those?
>
> In that case, your MR job doesn't need one final output, right?  You could do
> the Map over the entire table (or start/stop rows depending on schema) and
> with the appropriate filters.  You would output (customerid + hour bucket) as
> the key and 1 for the value.  You'd get a reduce for each customerid/hour
> bucket and would write that to HBase.
>
> One of the ideas behind coprocessors is you could do the per-customer
> scan/filter/aggregate as a parallel operation inside the RSs (without the
> overhead of MR or cross-JVM) and might be able to increase the number of
> rows you can process within a reasonable amount of time.
>
> Another approach to these kinds of aggregates, if you care about realtime at
> some level, is to use HBase's increment capabilities and a similar hour-
> bucketed schema but updated on demand instead of in batch.
>
> Yeah, this is a "basic" operation but that only means there are 100 ways to
> implement it :)
>
> JG
>
> > -----Original Message-----
> > From: Peter Haidinyak [mailto:[EMAIL PROTECTED]]
> > Sent: Friday, December 17, 2010 12:13 PM
> > To: [EMAIL PROTECTED]
> > Subject: RE: Results from a Map/Reduce
> >
> > What I have is basically a query on a log table to return the number
> > of hits per hour for customer X for Y days and having the ability to
> > filtering on columns, these are to be displayed in a web page on demand.
> > Currently, using a Scan, with a popular customer I can get back
> > millions of rows to aggregate into 'Hits per hour' buckets. I wanted
> > to push the aggregation back to a Map/Reduce and then have those
> > results available to send back as a web page.
> > This seems like such a basic operation that I am hoping there are
> > 'Best Practices' or examples on how to accomplish this. I would also like a
> pony too.
> > :-)
> >
> > Thanks
> >
> > -Pete
> >
> > -----Original Message-----
> > From: Jonathan Gray [mailto:[EMAIL PROTECTED]]
> > Sent: Friday, December 17, 2010 12:01 PM
> > To: [EMAIL PROTECTED]
> > Subject: RE: Results from a Map/Reduce
> >
> > There's not much in the way of examples for coprocessors besides the
> > implementation of Security.  Check out HBASE-2000 and go from there.
> > If you're fairly new to HBase, then wait a couple months and there
> > should be much better support around Coprocessors.
> >
> > I'm unsure of a way to have a final result returned back to the main()
> > method.  What exactly are you trying to do with this result?
> > Available to you to do what with it?  Could the MR job put the result
> > back into HBase or could your reducer contain the logic you need to use
> with the final result?
> >
> > > -----Original Message-----
> > > From: Peter Haidinyak [mailto:[EMAIL PROTECTED]]
> > > Sent: Friday, December 17, 2010 11:56 AM
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB