|
Arko Provo Mukherjee
2011-06-13, 21:46
Bibek Paudel
2011-06-13, 21:53
Marcos Ortiz
2011-06-13, 21:56
GOEKE, MATTHEW
2011-06-13, 22:01
Alejandro Abdelnur
2011-06-13, 22:13
Moustafa Gaber
2011-06-13, 22:30
Arko Provo Mukherjee
2011-06-13, 22:39
Moustafa Gaber
2011-06-14, 01:12
Sean Owen
2011-06-14, 06:28
|
-
Programming Multiple rounds of mapreduceArko Provo Mukherjee 2011-06-13, 21:46
Hello,
I am trying to write a program where I need to write multiple rounds of map and reduce. The output of the last round of map-reduce must be fed into the input of the next round. Can anyone please guide me to any link / material that can teach me as to how I can achieve this. Thanks a lot in advance! Thanks & regards Arko
-
Re: Programming Multiple rounds of mapreduceBibek Paudel 2011-06-13, 21:53
Hi,
On Mon, Jun 13, 2011 at 11:46 PM, Arko Provo Mukherjee <[EMAIL PROTECTED]> wrote: > Hello, > > I am trying to write a program where I need to write multiple rounds of map > and reduce. > > The output of the last round of map-reduce must be fed into the input of the > next round. > > Can anyone please guide me to any link / material that can teach me as to > how I can achieve this. > The way I do it is: create job1 job1 <-- feed all the configuration parameters (incl input and output path) to this job run job 1 create job2 job2 <-- feed all config params (output of job1 as input, another path as output) run job2 .... so on. I think this is the recommended way of running multiple rounds of MR in hadoop. -b
-
Re: Programming Multiple rounds of mapreduceMarcos Ortiz 2011-06-13, 21:56
Well, you can define a job for each round and then, you can define the
running workflow based in your implementation and to chain your jobs El 6/13/2011 5:46 PM, Arko Provo Mukherjee escribi�: > Hello, > > I am trying to write a program where I need to write multiple rounds > of map and reduce. > > The output of the last round of map-reduce must be fed into the input > of the next round. > > Can anyone please guide me to any link / material that can teach me as > to how I can achieve this. > > Thanks a lot in advance! > > Thanks & regards > Arko -- Marcos Lu�s Ort�z Valmaseda Software Engineer (UCI) http://marcosluis2186.posterous.com http://twitter.com/marcosluis2186
-
RE: Programming Multiple rounds of mapreduceGOEKE, MATTHEW 2011-06-13, 22:01
If you know for certain that it needs to be split into multiple work units I would suggest looking into Oozie. Easy to install, light weight, low learning curve... for my purposes it's been very helpful so far. I am also fairly certain you can chain multiple job confs into the same run but I have not actually tried that therefore I can't promise it is easy or possible.
http://www.cloudera.com/blog/2010/07/whats-new-in-cdh3-b2-oozie/ If you are not running CDH3u0 then you can also get the tarball and documentation directly here: https://ccp.cloudera.com/display/SUPPORT/CDH3+Downloadable+Tarballs Matt -----Original Message----- From: Marcos Ortiz [mailto:[EMAIL PROTECTED]] Sent: Monday, June 13, 2011 4:57 PM To: [EMAIL PROTECTED] Cc: Arko Provo Mukherjee Subject: Re: Programming Multiple rounds of mapreduce Well, you can define a job for each round and then, you can define the running workflow based in your implementation and to chain your jobs El 6/13/2011 5:46 PM, Arko Provo Mukherjee escribió: > Hello, > > I am trying to write a program where I need to write multiple rounds > of map and reduce. > > The output of the last round of map-reduce must be fed into the input > of the next round. > > Can anyone please guide me to any link / material that can teach me as > to how I can achieve this. > > Thanks a lot in advance! > > Thanks & regards > Arko -- Marcos Luís Ortíz Valmaseda Software Engineer (UCI) http://marcosluis2186.posterous.com http://twitter.com/marcosluis2186 This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited. All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware". Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment. The information contained in this email may be subject to the export control laws and regulations of the United States, potentially including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all applicable U.S. export laws and regulations.
-
Re: Programming Multiple rounds of mapreduceAlejandro Abdelnur 2011-06-13, 22:13
Thanks Matt,
Arko, if you plan to use Oozie, you can have a simple coordinator job that does does, for example (the following schedules a WF every 5 mins that consumes the output produced by the previous run, you just have to have the initial data) Thxs. Alejandro ---- <coordinator-app name="coord-1" frequency="${coord:minutes(5)}" start="${start}" end="${end}" timezone="UTC" xmlns="uri:oozie:coordinator:0.1"> <controls> <concurrency>1</concurrency> </controls> <datasets> <dataset name="data" frequency="${coord:minutes(5)}" initial-instance="${start}" timezone="UTC"> <uri-template>${nameNode}/user/${coord:user()}/examples/${dataRoot}/${YEAR}-${MONTH}-${DAY}-${HOUR}-${MINUTE} </uri-template> </dataset> </datasets> <input-events> <data-in name="input" dataset="data"> <instance>${coord:current(0)}</instance> </data-in> </input-events> <output-events> <data-out name="output" dataset="data"> <instance>${coord:current(1)}</instance> </data-out> </output-events> <action> <workflow> <app-path>${nameNode}/user/${coord:user()}/examples/apps/subwf-1</app-path> <configuration> <property> <name>jobTracker</name> <value>${jobTracker}</value> </property> <property> <name>nameNode</name> <value>${nameNode}</value> </property> <property> <name>queueName</name> <value>${queueName}</value> </property> <property> <name>examplesRoot</name> <value>${examplesRoot}</value> </property> <property> <name>inputDir</name> <value>${coord:dataIn('input')}</value> </property> <property> <name>outputDir</name> <value>${coord:dataOut('output')}</value> </property> </configuration> </workflow> </action> </coordinator-app> ------ On Mon, Jun 13, 2011 at 3:01 PM, GOEKE, MATTHEW (AG/1000) < [EMAIL PROTECTED]> wrote: > If you know for certain that it needs to be split into multiple work units > I would suggest looking into Oozie. Easy to install, light weight, low > learning curve... for my purposes it's been very helpful so far. I am also > fairly certain you can chain multiple job confs into the same run but I have > not actually tried that therefore I can't promise it is easy or possible. > > http://www.cloudera.com/blog/2010/07/whats-new-in-cdh3-b2-oozie/ > > If you are not running CDH3u0 then you can also get the tarball and > documentation directly here: > https://ccp.cloudera.com/display/SUPPORT/CDH3+Downloadable+Tarballs > > Matt > > -----Original Message----- > From: Marcos Ortiz [mailto:[EMAIL PROTECTED]] > Sent: Monday, June 13, 2011 4:57 PM > To: [EMAIL PROTECTED] > Cc: Arko Provo Mukherjee > Subject: Re: Programming Multiple rounds of mapreduce > > Well, you can define a job for each round and then, you can define the > running workflow based in your implementation and to chain your jobs > > El 6/13/2011 5:46 PM, Arko Provo Mukherjee escribió: > > Hello, > > > > I am trying to write a program where I need to write multiple rounds > > of map and reduce. > > > > The output of the last round of map-reduce must be fed into the input > > of the next round. > > > > Can anyone please guide me to any link / material that can teach me as > > to how I can achieve this. > > > > Thanks a lot in advance! > > > > Thanks & regards > > Arko > > -- > Marcos Luís Ortíz Valmaseda > Software Engineer (UCI) > http://marcosluis2186.posterous.com > http://twitter.com/marcosluis2186 > > > This e-mail message may contain privileged and/or confidential information, > and is intended to be received only by persons entitled > to receive such information. If you have received this e-mail in error, > please notify the sender immediately. Please delete it and > all attachments from any servers, hard drives or any other media. Other use
-
Re: Programming Multiple rounds of mapreduceMoustafa Gaber 2011-06-13, 22:30
I think HaLoop is a framework which can answer your question:
http://code.google.com/p/haloop/ On Mon, Jun 13, 2011 at 5:46 PM, Arko Provo Mukherjee < [EMAIL PROTECTED]> wrote: > Hello, > > I am trying to write a program where I need to write multiple rounds of map > and reduce. > > The output of the last round of map-reduce must be fed into the input of > the next round. > > Can anyone please guide me to any link / material that can teach me as to > how I can achieve this. > > Thanks a lot in advance! > > Thanks & regards > Arko > -- Best Regards, Mostafa Ead
-
Re: Programming Multiple rounds of mapreduceArko Provo Mukherjee 2011-06-13, 22:39
Hello,
Thanks everyone for your responses. I am new to Hadoop, so this was a lot of new information for me. I will surely go though all of these. However, I was actually hoping that someone could point me to some example codes where multiple rounds of map-reduce has been used. Please let me know if anyone has any such examples as they are the best way to learn for me :-) Thanks much! Cheers Arko On Mon, Jun 13, 2011 at 5:30 PM, Moustafa Gaber <[EMAIL PROTECTED]>wrote: > I think HaLoop is a framework which can answer your question: > http://code.google.com/p/haloop/ > > > On Mon, Jun 13, 2011 at 5:46 PM, Arko Provo Mukherjee < > [EMAIL PROTECTED]> wrote: > >> Hello, >> >> I am trying to write a program where I need to write multiple rounds of >> map and reduce. >> >> The output of the last round of map-reduce must be fed into the input of >> the next round. >> >> Can anyone please guide me to any link / material that can teach me as to >> how I can achieve this. >> >> Thanks a lot in advance! >> >> Thanks & regards >> Arko >> > > > > -- > Best Regards, > Mostafa Ead > >
-
Re: Programming Multiple rounds of mapreduceMoustafa Gaber 2011-06-14, 01:12
Actually, HaLoop is a new framework above Hadoop which targets the problem
of transitive closure algorithms. This type of algorithms contain rounds of hadoop jobs, so I think it may contain some useful examples for you. On Mon, Jun 13, 2011 at 6:39 PM, Arko Provo Mukherjee < [EMAIL PROTECTED]> wrote: > Hello, > > Thanks everyone for your responses. > > I am new to Hadoop, so this was a lot of new information for me. I will > surely go though all of these. > > However, I was actually hoping that someone could point me to some example > codes where multiple rounds of map-reduce has been used. > > Please let me know if anyone has any such examples as they are the best way > to learn for me :-) > > Thanks much! > Cheers > Arko > > > > > On Mon, Jun 13, 2011 at 5:30 PM, Moustafa Gaber <[EMAIL PROTECTED]>wrote: > >> I think HaLoop is a framework which can answer your question: >> http://code.google.com/p/haloop/ >> >> >> On Mon, Jun 13, 2011 at 5:46 PM, Arko Provo Mukherjee < >> [EMAIL PROTECTED]> wrote: >> >>> Hello, >>> >>> I am trying to write a program where I need to write multiple rounds of >>> map and reduce. >>> >>> The output of the last round of map-reduce must be fed into the input of >>> the next round. >>> >>> Can anyone please guide me to any link / material that can teach me as to >>> how I can achieve this. >>> >>> Thanks a lot in advance! >>> >>> Thanks & regards >>> Arko >>> >> >> >> >> -- >> Best Regards, >> Mostafa Ead >> >> > -- Best Regards, Mostafa Ead
-
Re: Programming Multiple rounds of mapreduceSean Owen 2011-06-14, 06:28
You could have a look at the MapReduce pipelines in Apache Mahout
(http://mahout.apache.org). See for instance org.apache.mahout.cf.taste.hadoop.item.RecommenderJob. This shows how most of Mahout constructs and runs a series of rounds of MapReduce to accomplish a task. Each job feeds into one or more of the later rounds. It is at least an example of getting in done in straight Hadoop -- though workflow systems like Oozie et al are probably the kinds of things you want to look at now. On Mon, Jun 13, 2011 at 10:46 PM, Arko Provo Mukherjee <[EMAIL PROTECTED]> wrote: > Hello, > > I am trying to write a program where I need to write multiple rounds of map > and reduce. > > The output of the last round of map-reduce must be fed into the input of the > next round. > > Can anyone please guide me to any link / material that can teach me as to > how I can achieve this. > > Thanks a lot in advance! > > Thanks & regards > Arko > |