Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Incremental pre-aggregation strategy with MapReduce


Copy link to this message
-
Re: Incremental pre-aggregation strategy with MapReduce
Could you give some insights into the kind of measurements you are saving
and a sample aggregate?

Best Regards,
Sonal
Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>

On Fri, Sep 2, 2011 at 9:29 PM, Doug Meil <[EMAIL PROTECTED]>wrote:

>
> What Stack says.  Plus, for other tips see...
>
> http://hbase.apache.org/book.html#mapreduce
>
> http://hbase.apache.org/book.html#schema
>
>
>
>
>
> On 9/2/11 11:15 AM, "Stack" <[EMAIL PROTECTED]> wrote:
>
> >Can you rely on versioning?  If MR job runs once a day, only aggregate
> >whats changed in last day?
> >
> >Turn off speculative execution.
> >
> >You'll need a means of dealing with MR jobs failing; i.e. throw away
> >the aggregations done by the failed job rather than have the
> >aggregations done by the failed job(s) plus the successful job
> >compounded.
> >
> >St.Ack
> >
> >On Thu, Sep 1, 2011 at 11:06 PM, Steinmaurer Thomas
> ><[EMAIL PROTECTED]> wrote:
> >> Hello,
> >>
> >>
> >>
> >> we are storing detailed measurement values in a Hadoop/Hbase cluster.
> >> For end-user / analysis tasks, we need to provide aggregated values
> >> along a date dimension (aggregate by day, month, quarter, year). The
> >> aggregates shall be stored in an Oracle database for easier data
> >> mangling via different client types (OLAP clients ...)
> >>
> >>
> >>
> >> A brute-force approach for generating the aggregates is to run a
> >> MapReduce job in the night which process the entire Hbase table and does
> >> the aggregation.
> >>
> >>
> >>
> >> I wonder, are there any best practices on how to possibly do the
> >> pre-aggregation thing via a MapReduce job in an incremental way? For
> >> example, how to detect changes in HBase since the last MR-Job run etc
> >> ...
> >>
> >>
> >>
> >> Thanks!
> >>
> >>
> >>
> >> Regards,
> >>
> >> Thomas
> >>
> >>
> >>
> >>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB