Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: Accumulo and Mapreduce


+
Aji Janis 2013-03-04, 16:11
Copy link to this message
-
Re: Accumulo and Mapreduce
Hi Aji,

Oozie is a mature project for managing MapReduce workflows.
http://oozie.apache.org/

-Sandy
On Mon, Mar 4, 2013 at 8:17 AM, Justin Woody <[EMAIL PROTECTED]> wrote:

> Aji,
>
> Why don't you just chain the jobs together?
> http://developer.yahoo.com/hadoop/tutorial/module4.html#chaining
>
> Justin
>
> On Mon, Mar 4, 2013 at 11:11 AM, Aji Janis <[EMAIL PROTECTED]> wrote:
> > Russell thanks for the link.
> >
> > I am interested in finding a solution (if out there) where Mapper1
> outputs a
> > custom object and Mapper 2 can use that as input. One way to do this
> > obviously by writing to Accumulo, in my case. But, is there another
> solution
> > for this:
> >
> > List<MyObject> ----> Input to Job
> >
> > MyObject ---> Input to Mapper1 (process MyObject) ----> Output
> <MyObjectId,
> > MyObject>
> >
> > <MyObjectId, MyObject> are Input to Mapper2 ... and so on
> >
> >
> >
> > Ideas?
> >
> >
> > On Mon, Mar 4, 2013 at 10:00 AM, Russell Jurney <
> [EMAIL PROTECTED]>
> > wrote:
> >>
> >>
> >>
> http://svn.apache.org/repos/asf/accumulo/contrib/pig/trunk/src/main/java/org/apache/accumulo/pig/AccumuloStorage.java
> >>
> >> AccumuloStorage for Pig comes with Accumulo. Easiest way would be to try
> >> it.
> >>
> >> Russell Jurney http://datasyndrome.com
> >>
> >> On Mar 4, 2013, at 5:30 AM, Aji Janis <[EMAIL PROTECTED]> wrote:
> >>
> >> Hello,
> >>
> >>  I have a MR job design with a flow like this: Mapper1 -> Mapper2 ->
> >> Mapper3 -> Reducer1. Mapper1's input is an accumulo table. M1's output
> goes
> >> to M2.. and so on. Finally the Reducer writes output to Accumulo.
> >>
> >> Questions:
> >>
> >> 1) Has any one tried something like this before? Are there any workflow
> >> control apis (in or outside of Hadoop) that can help me set up the job
> like
> >> this. Or am I limited to use Quartz for this?
> >> 2) If both M2 and M3 needed to write some data to two same tables in
> >> Accumulo, is it possible to do so? Are there any good accumulo mapreduce
> >> jobs you can point me to? blogs/pages that I can use for reference
> (starting
> >> point/best practices).
> >>
> >> Thank you in advance for any suggestions!
> >>
> >> Aji
> >>
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB