Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Scheduling Map Reduce Jobs


Copy link to this message
-
Re: Scheduling Map Reduce Jobs
Counter question: why do you want to run M/R jobs to do aggregation? You
could do this insitu with a custom aggregation coprocessor. Essentially,
you would set a time span over which you would aggregate a row (or possibly
multiple rows, but then you have to be sure that they are on the same
region, which means using a custom split policy or pre-splitting and
turning splitting off all together). If you apply the CP at scan, flush and
compaction you should get the same behavior without all the messy IO. We
don't really have a good guide for how to do this kind of thing, but the
concept here is similar to what Accumulo does with
iterators<http://accumulo.apache.org/1.4/examples/combiner.html>
.

But to answer your original question, I use anything else than cron for
that kind of stuff (that's what its there for :).

-Jesse

-------------------
Jesse Yates
240-888-2200
@jesse_yates
jyates.github.com
On Mon, Apr 23, 2012 at 1:34 AM, apatro <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I'd like to know if there is some alternative to using crons while
> scheduling Map Reduce jobs wherein one can incorporate one's own scheduling
> logic. For instance, to perform aggregation on table data on a particular
> hour of the day or a particular day in a week and the sorts.
>
> Thanks in advance :)
>
> Arati Patro
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/Scheduling-Map-Reduce-Jobs-tp3931839p3931839.html
> Sent from the HBase - Developer mailing list archive at Nabble.com.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB