Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Programming Multiple rounds of mapreduce


Copy link to this message
-
Re: Programming Multiple rounds of mapreduce
Alejandro Abdelnur 2011-06-13, 22:13
Thanks Matt,

Arko, if you plan to use Oozie, you can have a simple coordinator job that
does does, for example (the following schedules a WF every 5 mins that
consumes the output produced by the previous run, you just have to have the
initial data)

Thxs.

Alejandro

----
<coordinator-app name="coord-1" frequency="${coord:minutes(5)}"
start="${start}" end="${end}" timezone="UTC"
                 xmlns="uri:oozie:coordinator:0.1">
  <controls>
    <concurrency>1</concurrency>
  </controls>

  <datasets>
    <dataset name="data" frequency="${coord:minutes(5)}"
initial-instance="${start}" timezone="UTC">

<uri-template>${nameNode}/user/${coord:user()}/examples/${dataRoot}/${YEAR}-${MONTH}-${DAY}-${HOUR}-${MINUTE}
      </uri-template>
    </dataset>
  </datasets>

  <input-events>
    <data-in name="input" dataset="data">
      <instance>${coord:current(0)}</instance>
    </data-in>
  </input-events>

  <output-events>
    <data-out name="output" dataset="data">
      <instance>${coord:current(1)}</instance>
    </data-out>
  </output-events>

  <action>
    <workflow>

<app-path>${nameNode}/user/${coord:user()}/examples/apps/subwf-1</app-path>
      <configuration>
        <property>
          <name>jobTracker</name>
          <value>${jobTracker}</value>
        </property>
        <property>
          <name>nameNode</name>
          <value>${nameNode}</value>
        </property>
        <property>
          <name>queueName</name>
          <value>${queueName}</value>
        </property>
        <property>
          <name>examplesRoot</name>
          <value>${examplesRoot}</value>
        </property>
        <property>
          <name>inputDir</name>
          <value>${coord:dataIn('input')}</value>
        </property>
        <property>
          <name>outputDir</name>
          <value>${coord:dataOut('output')}</value>
        </property>
      </configuration>
    </workflow>
  </action>
</coordinator-app>
------

On Mon, Jun 13, 2011 at 3:01 PM, GOEKE, MATTHEW (AG/1000) <
[EMAIL PROTECTED]> wrote:

> If you know for certain that it needs to be split into multiple work units
> I would suggest looking into Oozie. Easy to install, light weight, low
> learning curve... for my purposes it's been very helpful so far. I am also
> fairly certain you can chain multiple job confs into the same run but I have
> not actually tried that therefore I can't promise it is easy or possible.
>
> http://www.cloudera.com/blog/2010/07/whats-new-in-cdh3-b2-oozie/
>
> If you are not running CDH3u0 then you can also get the tarball and
> documentation directly here:
> https://ccp.cloudera.com/display/SUPPORT/CDH3+Downloadable+Tarballs
>
> Matt
>
> -----Original Message-----
> From: Marcos Ortiz [mailto:[EMAIL PROTECTED]]
> Sent: Monday, June 13, 2011 4:57 PM
> To: [EMAIL PROTECTED]
> Cc: Arko Provo Mukherjee
> Subject: Re: Programming Multiple rounds of mapreduce
>
> Well, you can define a job for each round and then, you can define the
> running workflow based in your implementation and to chain your jobs
>
> El 6/13/2011 5:46 PM, Arko Provo Mukherjee escribió:
> > Hello,
> >
> > I am trying to write a program where I need to write multiple rounds
> > of map and reduce.
> >
> > The output of the last round of map-reduce must be fed into the input
> > of the next round.
> >
> > Can anyone please guide me to any link / material that can teach me as
> > to how I can achieve this.
> >
> > Thanks a lot in advance!
> >
> > Thanks & regards
> > Arko
>
> --
> Marcos Luís Ortíz Valmaseda
>  Software Engineer (UCI)
>  http://marcosluis2186.posterous.com
>  http://twitter.com/marcosluis2186
>
>
> This e-mail message may contain privileged and/or confidential information,
> and is intended to be received only by persons entitled
> to receive such information. If you have received this e-mail in error,
> please notify the sender immediately. Please delete it and
> all attachments from any servers, hard drives or any other media. Other use