|
|
-
Re: Programming Multiple rounds of mapreduceAlejandro Abdelnur 2011-06-13, 22:13
Thanks Matt,
Arko, if you plan to use Oozie, you can have a simple coordinator job that does does, for example (the following schedules a WF every 5 mins that consumes the output produced by the previous run, you just have to have the initial data) Thxs. Alejandro ---- <coordinator-app name="coord-1" frequency="${coord:minutes(5)}" start="${start}" end="${end}" timezone="UTC" xmlns="uri:oozie:coordinator:0.1"> <controls> <concurrency>1</concurrency> </controls> <datasets> <dataset name="data" frequency="${coord:minutes(5)}" initial-instance="${start}" timezone="UTC"> <uri-template>${nameNode}/user/${coord:user()}/examples/${dataRoot}/${YEAR}-${MONTH}-${DAY}-${HOUR}-${MINUTE} </uri-template> </dataset> </datasets> <input-events> <data-in name="input" dataset="data"> <instance>${coord:current(0)}</instance> </data-in> </input-events> <output-events> <data-out name="output" dataset="data"> <instance>${coord:current(1)}</instance> </data-out> </output-events> <action> <workflow> <app-path>${nameNode}/user/${coord:user()}/examples/apps/subwf-1</app-path> <configuration> <property> <name>jobTracker</name> <value>${jobTracker}</value> </property> <property> <name>nameNode</name> <value>${nameNode}</value> </property> <property> <name>queueName</name> <value>${queueName}</value> </property> <property> <name>examplesRoot</name> <value>${examplesRoot}</value> </property> <property> <name>inputDir</name> <value>${coord:dataIn('input')}</value> </property> <property> <name>outputDir</name> <value>${coord:dataOut('output')}</value> </property> </configuration> </workflow> </action> </coordinator-app> ------ On Mon, Jun 13, 2011 at 3:01 PM, GOEKE, MATTHEW (AG/1000) < [EMAIL PROTECTED]> wrote: > If you know for certain that it needs to be split into multiple work units > I would suggest looking into Oozie. Easy to install, light weight, low > learning curve... for my purposes it's been very helpful so far. I am also > fairly certain you can chain multiple job confs into the same run but I have > not actually tried that therefore I can't promise it is easy or possible. > > http://www.cloudera.com/blog/2010/07/whats-new-in-cdh3-b2-oozie/ > > If you are not running CDH3u0 then you can also get the tarball and > documentation directly here: > https://ccp.cloudera.com/display/SUPPORT/CDH3+Downloadable+Tarballs > > Matt > > -----Original Message----- > From: Marcos Ortiz [mailto:[EMAIL PROTECTED]] > Sent: Monday, June 13, 2011 4:57 PM > To: [EMAIL PROTECTED] > Cc: Arko Provo Mukherjee > Subject: Re: Programming Multiple rounds of mapreduce > > Well, you can define a job for each round and then, you can define the > running workflow based in your implementation and to chain your jobs > > El 6/13/2011 5:46 PM, Arko Provo Mukherjee escribió: > > Hello, > > > > I am trying to write a program where I need to write multiple rounds > > of map and reduce. > > > > The output of the last round of map-reduce must be fed into the input > > of the next round. > > > > Can anyone please guide me to any link / material that can teach me as > > to how I can achieve this. > > > > Thanks a lot in advance! > > > > Thanks & regards > > Arko > > -- > Marcos Luís Ortíz Valmaseda > Software Engineer (UCI) > http://marcosluis2186.posterous.com > http://twitter.com/marcosluis2186 > > > This e-mail message may contain privileged and/or confidential information, > and is intended to be received only by persons entitled > to receive such information. If you have received this e-mail in error, > please notify the sender immediately. Please delete it and > all attachments from any servers, hard drives or any other media. Other use |