|
Mark Kerzner
2012-02-02, 00:21
Anil Gupta
2012-02-02, 01:03
Mark Kerzner
2012-02-02, 01:06
Anil Gupta
2012-02-02, 01:44
Mark Kerzner
2012-02-02, 04:13
Praveen Sripati
2012-02-02, 13:39
Mark Kerzner
2012-02-02, 19:55
Praveen Sripati
2012-02-03, 01:38
Harsh J
2012-02-03, 03:55
Mark Kerzner
2012-02-03, 04:25
|
-
Can't achieve load distributionMark Kerzner 2012-02-02, 00:21
Hi,
I have a simple MR job, and I want each Mapper to get one line from my input file (which contains further instructions for lengthy processing). Each line is 100 characters long, and I tell Hadoop to read only 100 bytes, job.getConfiguration().setInt("mapreduce.input.linerecordreader.line.maxlength", 100); I see that this part works - it reads only one line at a time, and if I change this parameter, it listens. However, on a cluster only one node receives all the map tasks. Only one map tasks is started. The others never get anything, they just wait. I've added 100 seconds wait to the mapper - no change! Any advice? Thank you. Sincerely, Mark
-
Re: Can't achieve load distributionAnil Gupta 2012-02-02, 01:03
Do u have enough data to start more than one mapper?
If entire data is less than a block size then only 1 mapper will run. Best Regards, Anil On Feb 1, 2012, at 4:21 PM, Mark Kerzner <[EMAIL PROTECTED]> wrote: > Hi, > > I have a simple MR job, and I want each Mapper to get one line from my > input file (which contains further instructions for lengthy processing). > Each line is 100 characters long, and I tell Hadoop to read only 100 bytes, > > job.getConfiguration().setInt("mapreduce.input.linerecordreader.line.maxlength", > 100); > > I see that this part works - it reads only one line at a time, and if I > change this parameter, it listens. > > However, on a cluster only one node receives all the map tasks. Only one > map tasks is started. The others never get anything, they just wait. I've > added 100 seconds wait to the mapper - no change! > > Any advice? > > Thank you. Sincerely, > Mark
-
Re: Can't achieve load distributionMark Kerzner 2012-02-02, 01:06
Anil,
do you mean one block of HDFS, like 64MB? Mark On Wed, Feb 1, 2012 at 7:03 PM, Anil Gupta <[EMAIL PROTECTED]> wrote: > Do u have enough data to start more than one mapper? > If entire data is less than a block size then only 1 mapper will run. > > Best Regards, > Anil > > On Feb 1, 2012, at 4:21 PM, Mark Kerzner <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > I have a simple MR job, and I want each Mapper to get one line from my > > input file (which contains further instructions for lengthy processing). > > Each line is 100 characters long, and I tell Hadoop to read only 100 > bytes, > > > > > job.getConfiguration().setInt("mapreduce.input.linerecordreader.line.maxlength", > > 100); > > > > I see that this part works - it reads only one line at a time, and if I > > change this parameter, it listens. > > > > However, on a cluster only one node receives all the map tasks. Only one > > map tasks is started. The others never get anything, they just wait. I've > > added 100 seconds wait to the mapper - no change! > > > > Any advice? > > > > Thank you. Sincerely, > > Mark >
-
Re: Can't achieve load distributionAnil Gupta 2012-02-02, 01:44
Yes, if ur block size is 64mb. Btw, block size is configurable in Hadoop.
Best Regards, Anil On Feb 1, 2012, at 5:06 PM, Mark Kerzner <[EMAIL PROTECTED]> wrote: > Anil, > > do you mean one block of HDFS, like 64MB? > > Mark > > On Wed, Feb 1, 2012 at 7:03 PM, Anil Gupta <[EMAIL PROTECTED]> wrote: > >> Do u have enough data to start more than one mapper? >> If entire data is less than a block size then only 1 mapper will run. >> >> Best Regards, >> Anil >> >> On Feb 1, 2012, at 4:21 PM, Mark Kerzner <[EMAIL PROTECTED]> wrote: >> >>> Hi, >>> >>> I have a simple MR job, and I want each Mapper to get one line from my >>> input file (which contains further instructions for lengthy processing). >>> Each line is 100 characters long, and I tell Hadoop to read only 100 >> bytes, >>> >>> >> job.getConfiguration().setInt("mapreduce.input.linerecordreader.line.maxlength", >>> 100); >>> >>> I see that this part works - it reads only one line at a time, and if I >>> change this parameter, it listens. >>> >>> However, on a cluster only one node receives all the map tasks. Only one >>> map tasks is started. The others never get anything, they just wait. I've >>> added 100 seconds wait to the mapper - no change! >>> >>> Any advice? >>> >>> Thank you. Sincerely, >>> Mark >>
-
Re: Can't achieve load distributionMark Kerzner 2012-02-02, 04:13
Thanks!
Mark On Wed, Feb 1, 2012 at 7:44 PM, Anil Gupta <[EMAIL PROTECTED]> wrote: > Yes, if ur block size is 64mb. Btw, block size is configurable in Hadoop. > > Best Regards, > Anil > > On Feb 1, 2012, at 5:06 PM, Mark Kerzner <[EMAIL PROTECTED]> wrote: > > > Anil, > > > > do you mean one block of HDFS, like 64MB? > > > > Mark > > > > On Wed, Feb 1, 2012 at 7:03 PM, Anil Gupta <[EMAIL PROTECTED]> > wrote: > > > >> Do u have enough data to start more than one mapper? > >> If entire data is less than a block size then only 1 mapper will run. > >> > >> Best Regards, > >> Anil > >> > >> On Feb 1, 2012, at 4:21 PM, Mark Kerzner <[EMAIL PROTECTED]> > wrote: > >> > >>> Hi, > >>> > >>> I have a simple MR job, and I want each Mapper to get one line from my > >>> input file (which contains further instructions for lengthy > processing). > >>> Each line is 100 characters long, and I tell Hadoop to read only 100 > >> bytes, > >>> > >>> > >> > job.getConfiguration().setInt("mapreduce.input.linerecordreader.line.maxlength", > >>> 100); > >>> > >>> I see that this part works - it reads only one line at a time, and if I > >>> change this parameter, it listens. > >>> > >>> However, on a cluster only one node receives all the map tasks. Only > one > >>> map tasks is started. The others never get anything, they just wait. > I've > >>> added 100 seconds wait to the mapper - no change! > >>> > >>> Any advice? > >>> > >>> Thank you. Sincerely, > >>> Mark > >> >
-
Re: Can't achieve load distributionPraveen Sripati 2012-02-02, 13:39
> I have a simple MR job, and I want each Mapper to get one line from my
input file (which contains further instructions for lengthy processing). Use the NLineInputFormat class. http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.html Praveen On Thu, Feb 2, 2012 at 9:43 AM, Mark Kerzner <[EMAIL PROTECTED]>wrote: > Thanks! > Mark > > On Wed, Feb 1, 2012 at 7:44 PM, Anil Gupta <[EMAIL PROTECTED]> wrote: > > > Yes, if ur block size is 64mb. Btw, block size is configurable in Hadoop. > > > > Best Regards, > > Anil > > > > On Feb 1, 2012, at 5:06 PM, Mark Kerzner <[EMAIL PROTECTED]> > wrote: > > > > > Anil, > > > > > > do you mean one block of HDFS, like 64MB? > > > > > > Mark > > > > > > On Wed, Feb 1, 2012 at 7:03 PM, Anil Gupta <[EMAIL PROTECTED]> > > wrote: > > > > > >> Do u have enough data to start more than one mapper? > > >> If entire data is less than a block size then only 1 mapper will run. > > >> > > >> Best Regards, > > >> Anil > > >> > > >> On Feb 1, 2012, at 4:21 PM, Mark Kerzner <[EMAIL PROTECTED]> > > wrote: > > >> > > >>> Hi, > > >>> > > >>> I have a simple MR job, and I want each Mapper to get one line from > my > > >>> input file (which contains further instructions for lengthy > > processing). > > >>> Each line is 100 characters long, and I tell Hadoop to read only 100 > > >> bytes, > > >>> > > >>> > > >> > > > job.getConfiguration().setInt("mapreduce.input.linerecordreader.line.maxlength", > > >>> 100); > > >>> > > >>> I see that this part works - it reads only one line at a time, and > if I > > >>> change this parameter, it listens. > > >>> > > >>> However, on a cluster only one node receives all the map tasks. Only > > one > > >>> map tasks is started. The others never get anything, they just wait. > > I've > > >>> added 100 seconds wait to the mapper - no change! > > >>> > > >>> Any advice? > > >>> > > >>> Thank you. Sincerely, > > >>> Mark > > >> > > >
-
Re: Can't achieve load distributionMark Kerzner 2012-02-02, 19:55
Praveen,
this seems just like the right thing, but it's API 0.21 (I googled about the problems with it), so I have to use either the next Cloudera release, or Hortonworks, or something, am I right? Mark On Thu, Feb 2, 2012 at 7:39 AM, Praveen Sripati <[EMAIL PROTECTED]>wrote: > > I have a simple MR job, and I want each Mapper to get one line from my > input file (which contains further instructions for lengthy processing). > > Use the NLineInputFormat class. > > > http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.html > > Praveen > > On Thu, Feb 2, 2012 at 9:43 AM, Mark Kerzner <[EMAIL PROTECTED] > >wrote: > > > Thanks! > > Mark > > > > On Wed, Feb 1, 2012 at 7:44 PM, Anil Gupta <[EMAIL PROTECTED]> > wrote: > > > > > Yes, if ur block size is 64mb. Btw, block size is configurable in > Hadoop. > > > > > > Best Regards, > > > Anil > > > > > > On Feb 1, 2012, at 5:06 PM, Mark Kerzner <[EMAIL PROTECTED]> > > wrote: > > > > > > > Anil, > > > > > > > > do you mean one block of HDFS, like 64MB? > > > > > > > > Mark > > > > > > > > On Wed, Feb 1, 2012 at 7:03 PM, Anil Gupta <[EMAIL PROTECTED]> > > > wrote: > > > > > > > >> Do u have enough data to start more than one mapper? > > > >> If entire data is less than a block size then only 1 mapper will > run. > > > >> > > > >> Best Regards, > > > >> Anil > > > >> > > > >> On Feb 1, 2012, at 4:21 PM, Mark Kerzner <[EMAIL PROTECTED]> > > > wrote: > > > >> > > > >>> Hi, > > > >>> > > > >>> I have a simple MR job, and I want each Mapper to get one line from > > my > > > >>> input file (which contains further instructions for lengthy > > > processing). > > > >>> Each line is 100 characters long, and I tell Hadoop to read only > 100 > > > >> bytes, > > > >>> > > > >>> > > > >> > > > > > > job.getConfiguration().setInt("mapreduce.input.linerecordreader.line.maxlength", > > > >>> 100); > > > >>> > > > >>> I see that this part works - it reads only one line at a time, and > > if I > > > >>> change this parameter, it listens. > > > >>> > > > >>> However, on a cluster only one node receives all the map tasks. > Only > > > one > > > >>> map tasks is started. The others never get anything, they just > wait. > > > I've > > > >>> added 100 seconds wait to the mapper - no change! > > > >>> > > > >>> Any advice? > > > >>> > > > >>> Thank you. Sincerely, > > > >>> Mark > > > >> > > > > > >
-
Re: Can't achieve load distributionPraveen Sripati 2012-02-03, 01:38
Mark,
NLineInputFormat was not something which was introduced in 0.21, I have just sent the reference to the 0.21 url FYI. It's in 0.20.205, 1.0.0 and 0.23 releases also. Praveen On Fri, Feb 3, 2012 at 1:25 AM, Mark Kerzner <[EMAIL PROTECTED]>wrote: > Praveen, > > this seems just like the right thing, but it's API 0.21 (I googled about > the problems with it), so I have to use either the next Cloudera release, > or Hortonworks, or something, am I right? > > Mark > > On Thu, Feb 2, 2012 at 7:39 AM, Praveen Sripati <[EMAIL PROTECTED] > >wrote: > > > > I have a simple MR job, and I want each Mapper to get one line from my > > input file (which contains further instructions for lengthy processing). > > > > Use the NLineInputFormat class. > > > > > > > http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.html > > > > Praveen > > > > On Thu, Feb 2, 2012 at 9:43 AM, Mark Kerzner <[EMAIL PROTECTED] > > >wrote: > > > > > Thanks! > > > Mark > > > > > > On Wed, Feb 1, 2012 at 7:44 PM, Anil Gupta <[EMAIL PROTECTED]> > > wrote: > > > > > > > Yes, if ur block size is 64mb. Btw, block size is configurable in > > Hadoop. > > > > > > > > Best Regards, > > > > Anil > > > > > > > > On Feb 1, 2012, at 5:06 PM, Mark Kerzner <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > Anil, > > > > > > > > > > do you mean one block of HDFS, like 64MB? > > > > > > > > > > Mark > > > > > > > > > > On Wed, Feb 1, 2012 at 7:03 PM, Anil Gupta <[EMAIL PROTECTED]> > > > > wrote: > > > > > > > > > >> Do u have enough data to start more than one mapper? > > > > >> If entire data is less than a block size then only 1 mapper will > > run. > > > > >> > > > > >> Best Regards, > > > > >> Anil > > > > >> > > > > >> On Feb 1, 2012, at 4:21 PM, Mark Kerzner < > [EMAIL PROTECTED]> > > > > wrote: > > > > >> > > > > >>> Hi, > > > > >>> > > > > >>> I have a simple MR job, and I want each Mapper to get one line > from > > > my > > > > >>> input file (which contains further instructions for lengthy > > > > processing). > > > > >>> Each line is 100 characters long, and I tell Hadoop to read only > > 100 > > > > >> bytes, > > > > >>> > > > > >>> > > > > >> > > > > > > > > > > job.getConfiguration().setInt("mapreduce.input.linerecordreader.line.maxlength", > > > > >>> 100); > > > > >>> > > > > >>> I see that this part works - it reads only one line at a time, > and > > > if I > > > > >>> change this parameter, it listens. > > > > >>> > > > > >>> However, on a cluster only one node receives all the map tasks. > > Only > > > > one > > > > >>> map tasks is started. The others never get anything, they just > > wait. > > > > I've > > > > >>> added 100 seconds wait to the mapper - no change! > > > > >>> > > > > >>> Any advice? > > > > >>> > > > > >>> Thank you. Sincerely, > > > > >>> Mark > > > > >> > > > > > > > > > >
-
Re: Can't achieve load distributionHarsh J 2012-02-03, 03:55
New API NLineInputFormat is only available from 1.0.1, and not in any
of the earlier 1 (1.0.0) or 0.20 (0.20.x, 0.20.xxx) vanilla Apache releases. On Fri, Feb 3, 2012 at 7:08 AM, Praveen Sripati <[EMAIL PROTECTED]> wrote: > Mark, > > NLineInputFormat was not something which was introduced in 0.21, I have > just sent the reference to the 0.21 url FYI. It's in 0.20.205, 1.0.0 and > 0.23 releases also. > > Praveen > > On Fri, Feb 3, 2012 at 1:25 AM, Mark Kerzner <[EMAIL PROTECTED]>wrote: > >> Praveen, >> >> this seems just like the right thing, but it's API 0.21 (I googled about >> the problems with it), so I have to use either the next Cloudera release, >> or Hortonworks, or something, am I right? >> >> Mark >> >> On Thu, Feb 2, 2012 at 7:39 AM, Praveen Sripati <[EMAIL PROTECTED] >> >wrote: >> >> > > I have a simple MR job, and I want each Mapper to get one line from my >> > input file (which contains further instructions for lengthy processing). >> > >> > Use the NLineInputFormat class. >> > >> > >> > >> http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.html >> > >> > Praveen >> > >> > On Thu, Feb 2, 2012 at 9:43 AM, Mark Kerzner <[EMAIL PROTECTED] >> > >wrote: >> > >> > > Thanks! >> > > Mark >> > > >> > > On Wed, Feb 1, 2012 at 7:44 PM, Anil Gupta <[EMAIL PROTECTED]> >> > wrote: >> > > >> > > > Yes, if ur block size is 64mb. Btw, block size is configurable in >> > Hadoop. >> > > > >> > > > Best Regards, >> > > > Anil >> > > > >> > > > On Feb 1, 2012, at 5:06 PM, Mark Kerzner <[EMAIL PROTECTED]> >> > > wrote: >> > > > >> > > > > Anil, >> > > > > >> > > > > do you mean one block of HDFS, like 64MB? >> > > > > >> > > > > Mark >> > > > > >> > > > > On Wed, Feb 1, 2012 at 7:03 PM, Anil Gupta <[EMAIL PROTECTED]> >> > > > wrote: >> > > > > >> > > > >> Do u have enough data to start more than one mapper? >> > > > >> If entire data is less than a block size then only 1 mapper will >> > run. >> > > > >> >> > > > >> Best Regards, >> > > > >> Anil >> > > > >> >> > > > >> On Feb 1, 2012, at 4:21 PM, Mark Kerzner < >> [EMAIL PROTECTED]> >> > > > wrote: >> > > > >> >> > > > >>> Hi, >> > > > >>> >> > > > >>> I have a simple MR job, and I want each Mapper to get one line >> from >> > > my >> > > > >>> input file (which contains further instructions for lengthy >> > > > processing). >> > > > >>> Each line is 100 characters long, and I tell Hadoop to read only >> > 100 >> > > > >> bytes, >> > > > >>> >> > > > >>> >> > > > >> >> > > > >> > > >> > >> job.getConfiguration().setInt("mapreduce.input.linerecordreader.line.maxlength", >> > > > >>> 100); >> > > > >>> >> > > > >>> I see that this part works - it reads only one line at a time, >> and >> > > if I >> > > > >>> change this parameter, it listens. >> > > > >>> >> > > > >>> However, on a cluster only one node receives all the map tasks. >> > Only >> > > > one >> > > > >>> map tasks is started. The others never get anything, they just >> > wait. >> > > > I've >> > > > >>> added 100 seconds wait to the mapper - no change! >> > > > >>> >> > > > >>> Any advice? >> > > > >>> >> > > > >>> Thank you. Sincerely, >> > > > >>> Mark >> > > > >> >> > > > >> > > >> > >> -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about
-
Re: Can't achieve load distributionMark Kerzner 2012-02-03, 04:25
And that is exactly what I found.
I have a "hack" for now - give all files on the command line - and I will wait for the next release in some distribution. Thank you, Mark On Thu, Feb 2, 2012 at 9:55 PM, Harsh J <[EMAIL PROTECTED]> wrote: > New API NLineInputFormat is only available from 1.0.1, and not in any > of the earlier 1 (1.0.0) or 0.20 (0.20.x, 0.20.xxx) vanilla Apache > releases. > > On Fri, Feb 3, 2012 at 7:08 AM, Praveen Sripati > <[EMAIL PROTECTED]> wrote: > > Mark, > > > > NLineInputFormat was not something which was introduced in 0.21, I have > > just sent the reference to the 0.21 url FYI. It's in 0.20.205, 1.0.0 and > > 0.23 releases also. > > > > Praveen > > > > On Fri, Feb 3, 2012 at 1:25 AM, Mark Kerzner <[EMAIL PROTECTED] > >wrote: > > > >> Praveen, > >> > >> this seems just like the right thing, but it's API 0.21 (I googled about > >> the problems with it), so I have to use either the next Cloudera > release, > >> or Hortonworks, or something, am I right? > >> > >> Mark > >> > >> On Thu, Feb 2, 2012 at 7:39 AM, Praveen Sripati < > [EMAIL PROTECTED] > >> >wrote: > >> > >> > > I have a simple MR job, and I want each Mapper to get one line from > my > >> > input file (which contains further instructions for lengthy > processing). > >> > > >> > Use the NLineInputFormat class. > >> > > >> > > >> > > >> > http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.html > >> > > >> > Praveen > >> > > >> > On Thu, Feb 2, 2012 at 9:43 AM, Mark Kerzner < > [EMAIL PROTECTED] > >> > >wrote: > >> > > >> > > Thanks! > >> > > Mark > >> > > > >> > > On Wed, Feb 1, 2012 at 7:44 PM, Anil Gupta <[EMAIL PROTECTED]> > >> > wrote: > >> > > > >> > > > Yes, if ur block size is 64mb. Btw, block size is configurable in > >> > Hadoop. > >> > > > > >> > > > Best Regards, > >> > > > Anil > >> > > > > >> > > > On Feb 1, 2012, at 5:06 PM, Mark Kerzner < > [EMAIL PROTECTED]> > >> > > wrote: > >> > > > > >> > > > > Anil, > >> > > > > > >> > > > > do you mean one block of HDFS, like 64MB? > >> > > > > > >> > > > > Mark > >> > > > > > >> > > > > On Wed, Feb 1, 2012 at 7:03 PM, Anil Gupta < > [EMAIL PROTECTED]> > >> > > > wrote: > >> > > > > > >> > > > >> Do u have enough data to start more than one mapper? > >> > > > >> If entire data is less than a block size then only 1 mapper > will > >> > run. > >> > > > >> > >> > > > >> Best Regards, > >> > > > >> Anil > >> > > > >> > >> > > > >> On Feb 1, 2012, at 4:21 PM, Mark Kerzner < > >> [EMAIL PROTECTED]> > >> > > > wrote: > >> > > > >> > >> > > > >>> Hi, > >> > > > >>> > >> > > > >>> I have a simple MR job, and I want each Mapper to get one line > >> from > >> > > my > >> > > > >>> input file (which contains further instructions for lengthy > >> > > > processing). > >> > > > >>> Each line is 100 characters long, and I tell Hadoop to read > only > >> > 100 > >> > > > >> bytes, > >> > > > >>> > >> > > > >>> > >> > > > >> > >> > > > > >> > > > >> > > >> > job.getConfiguration().setInt("mapreduce.input.linerecordreader.line.maxlength", > >> > > > >>> 100); > >> > > > >>> > >> > > > >>> I see that this part works - it reads only one line at a time, > >> and > >> > > if I > >> > > > >>> change this parameter, it listens. > >> > > > >>> > >> > > > >>> However, on a cluster only one node receives all the map > tasks. > >> > Only > >> > > > one > >> > > > >>> map tasks is started. The others never get anything, they just > >> > wait. > >> > > > I've > >> > > > >>> added 100 seconds wait to the mapper - no change! > >> > > > >>> > >> > > > >>> Any advice? > >> > > > >>> > >> > > > >>> Thank you. Sincerely, > >> > > > >>> Mark > >> > > > >> > >> > > > > >> > > > >> > > >> > > > > -- > Harsh J > Customer Ops. Engineer > Cloudera | http://tiny.cloudera.com/about > |