|
Yaron Gonen
2011-10-04, 15:45
Joey Echeverria
2011-10-04, 16:05
GOEKE, MATTHEW
2011-10-04, 16:05
GOEKE, MATTHEW
2011-10-04, 16:11
Joey Echeverria
2011-10-04, 16:28
Yaron Gonen
2011-10-05, 13:50
|
-
Submitting a Hadoop task from withing a reducerYaron Gonen 2011-10-04, 15:45
Hi,
Hadoop tasks are always stacked to form a linear user-managed workflow (a reduce step cannot start before all previous mappers have stopped etc). This may be problematic in recursive tasks: for example in a BFS we will not get any output until the longest branch has been reached. In order to solve than, an idea came up of submitting a whole Hadoop task from within a reducer. Have anyone tried it? Thanks.
-
Re: Submitting a Hadoop task from withing a reducerJoey Echeverria 2011-10-04, 16:05
You may want to check out Yarn, coming in Hadoop 0.23:
https://issues.apache.org/jira/browse/MAPREDUCE-279 -Joey On Tue, Oct 4, 2011 at 11:45 AM, Yaron Gonen <[EMAIL PROTECTED]> wrote: > Hi, > Hadoop tasks are always stacked to form a linear user-managed workflow (a > reduce step cannot start before all previous mappers have stopped etc). This > may be problematic in recursive tasks: for example in a BFS we will not get > any output until the longest branch has been reached. > In order to solve than, an idea came up of submitting a whole Hadoop task > from within a reducer. Have anyone tried it? > Thanks. > -- Joseph Echeverria Cloudera, Inc. 443.305.9434
-
RE: Submitting a Hadoop task from withing a reducerGOEKE, MATTHEW 2011-10-04, 16:05
As long as your reduce task can kick off the MR job asynchronously then it shouldn't be too much of an issue but it could very quickly result in a deadlock otherwise. If you set this up as two stages 1) to kick off the recursive MR and 2) analyze the final result set then it should work but off the top of my head I'm not sure how to sync when the second stage job runs unless you can just watch for a set of files to appear.
Matt From: Yaron Gonen [mailto:[EMAIL PROTECTED]] Sent: Tuesday, October 04, 2011 10:46 AM To: [EMAIL PROTECTED] Subject: Submitting a Hadoop task from withing a reducer Hi, Hadoop tasks are always stacked to form a linear user-managed workflow (a reduce step cannot start before all previous mappers have stopped etc). This may be problematic in recursive tasks: for example in a BFS we will not get any output until the longest branch has been reached. In order to solve than, an idea came up of submitting a whole Hadoop task from within a reducer. Have anyone tried it? Thanks. This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited. All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware". Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment. The information contained in this email may be subject to the export control laws and regulations of the United States, potentially including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all applicable U.S. export laws and regulations.
-
RE: Submitting a Hadoop task from withing a reducerGOEKE, MATTHEW 2011-10-04, 16:11
Joey,
Is yarn just a synonym for MRv2? And if so he would still have to create a custom application master for his job type right? Matt -----Original Message----- From: Joey Echeverria [mailto:[EMAIL PROTECTED]] Sent: Tuesday, October 04, 2011 11:06 AM To: [EMAIL PROTECTED] Subject: Re: Submitting a Hadoop task from withing a reducer You may want to check out Yarn, coming in Hadoop 0.23: https://issues.apache.org/jira/browse/MAPREDUCE-279 -Joey On Tue, Oct 4, 2011 at 11:45 AM, Yaron Gonen <[EMAIL PROTECTED]> wrote: > Hi, > Hadoop tasks are always stacked to form a linear user-managed workflow (a > reduce step cannot start before all previous mappers have stopped etc). This > may be problematic in recursive tasks: for example in a BFS we will not get > any output until the longest branch has been reached. > In order to solve than, an idea came up of submitting a whole Hadoop task > from within a reducer. Have anyone tried it? > Thanks. > -- Joseph Echeverria Cloudera, Inc. 443.305.9434 This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited. All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware". Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment. The information contained in this email may be subject to the export control laws and regulations of the United States, potentially including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all applicable U.S. export laws and regulations.
-
Re: Submitting a Hadoop task from withing a reducerJoey Echeverria 2011-10-04, 16:28
Yes. The reason I pointed it to him was that it seems like he's trying
to do something with Hadoop for which MapReduce may not be right execution model. Yarn/MRv2 gives you the ability to try other execution models. As you pointed out, it may require some extra development, but it is more flexible than straight MapReduce and probably a better option in the long run over submitting jobs from inside of jobs. -Joey On Tue, Oct 4, 2011 at 12:11 PM, GOEKE, MATTHEW (AG/1000) <[EMAIL PROTECTED]> wrote: > Joey, > > Is yarn just a synonym for MRv2? And if so he would still have to create a custom application master for his job type right? > > Matt > > -----Original Message----- > From: Joey Echeverria [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, October 04, 2011 11:06 AM > To: [EMAIL PROTECTED] > Subject: Re: Submitting a Hadoop task from withing a reducer > > You may want to check out Yarn, coming in Hadoop 0.23: > > https://issues.apache.org/jira/browse/MAPREDUCE-279 > > -Joey > > On Tue, Oct 4, 2011 at 11:45 AM, Yaron Gonen <[EMAIL PROTECTED]> wrote: >> Hi, >> Hadoop tasks are always stacked to form a linear user-managed workflow (a >> reduce step cannot start before all previous mappers have stopped etc). This >> may be problematic in recursive tasks: for example in a BFS we will not get >> any output until the longest branch has been reached. >> In order to solve than, an idea came up of submitting a whole Hadoop task >> from within a reducer. Have anyone tried it? >> Thanks. >> > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 > This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled > to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and > all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited. > > All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its > subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware". > Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying > this e-mail or any attachment. > > > The information contained in this email may be subject to the export control laws and regulations of the United States, potentially > including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of > Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all > applicable U.S. export laws and regulations. > > -- Joseph Echeverria Cloudera, Inc. 443.305.9434
-
Re: Submitting a Hadoop task from withing a reducerYaron Gonen 2011-10-05, 13:50
Thanks guys. I'll dig a little in Yarn.
On Tue, Oct 4, 2011 at 6:28 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote: > Yes. The reason I pointed it to him was that it seems like he's trying > to do something with Hadoop for which MapReduce may not be right > execution model. Yarn/MRv2 gives you the ability to try other > execution models. As you pointed out, it may require some extra > development, but it is more flexible than straight MapReduce and > probably a better option in the long run over submitting jobs from > inside of jobs. > > -Joey > > On Tue, Oct 4, 2011 at 12:11 PM, GOEKE, MATTHEW (AG/1000) > <[EMAIL PROTECTED]> wrote: > > Joey, > > > > Is yarn just a synonym for MRv2? And if so he would still have to create > a custom application master for his job type right? > > > > Matt > > > > -----Original Message----- > > From: Joey Echeverria [mailto:[EMAIL PROTECTED]] > > Sent: Tuesday, October 04, 2011 11:06 AM > > To: [EMAIL PROTECTED] > > Subject: Re: Submitting a Hadoop task from withing a reducer > > > > You may want to check out Yarn, coming in Hadoop 0.23: > > > > https://issues.apache.org/jira/browse/MAPREDUCE-279 > > > > -Joey > > > > On Tue, Oct 4, 2011 at 11:45 AM, Yaron Gonen <[EMAIL PROTECTED]> > wrote: > >> Hi, > >> Hadoop tasks are always stacked to form a linear user-managed workflow > (a > >> reduce step cannot start before all previous mappers have stopped etc). > This > >> may be problematic in recursive tasks: for example in a BFS we will not > get > >> any output until the longest branch has been reached. > >> In order to solve than, an idea came up of submitting a whole Hadoop > task > >> from within a reducer. Have anyone tried it? > >> Thanks. > >> > > > > > > > > -- > > Joseph Echeverria > > Cloudera, Inc. > > 443.305.9434 > > This e-mail message may contain privileged and/or confidential > information, and is intended to be received only by persons entitled > > to receive such information. If you have received this e-mail in error, > please notify the sender immediately. Please delete it and > > all attachments from any servers, hard drives or any other media. Other > use of this e-mail by you is strictly prohibited. > > > > All e-mails and attachments sent and received are subject to monitoring, > reading and archival by Monsanto, including its > > subsidiaries. The recipient of this e-mail is solely responsible for > checking for the presence of "Viruses" or other "Malware". > > Monsanto, along with its subsidiaries, accepts no liability for any > damage caused by any such code transmitted by or accompanying > > this e-mail or any attachment. > > > > > > The information contained in this email may be subject to the export > control laws and regulations of the United States, potentially > > including but not limited to the Export Administration Regulations (EAR) > and sanctions regulations issued by the U.S. Department of > > Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of > this information you are obligated to comply with all > > applicable U.S. export laws and regulations. > > > > > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 > |