Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Can't achieve load distribution


Copy link to this message
-
Re: Can't achieve load distribution
And that is exactly what I found.

I have a "hack" for now - give all files on the command line - and I will
wait for the next release in some distribution.

Thank you,
Mark

On Thu, Feb 2, 2012 at 9:55 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> New API NLineInputFormat is only available from 1.0.1, and not in any
> of the earlier 1 (1.0.0) or 0.20 (0.20.x, 0.20.xxx) vanilla Apache
> releases.
>
> On Fri, Feb 3, 2012 at 7:08 AM, Praveen Sripati
> <[EMAIL PROTECTED]> wrote:
> > Mark,
> >
> > NLineInputFormat was not something which was introduced in 0.21, I have
> > just sent the reference to the 0.21 url FYI. It's in 0.20.205, 1.0.0 and
> > 0.23 releases also.
> >
> > Praveen
> >
> > On Fri, Feb 3, 2012 at 1:25 AM, Mark Kerzner <[EMAIL PROTECTED]
> >wrote:
> >
> >> Praveen,
> >>
> >> this seems just like the right thing, but it's API 0.21 (I googled about
> >> the problems with it), so I have to use either the next Cloudera
> release,
> >> or Hortonworks, or something, am I right?
> >>
> >> Mark
> >>
> >> On Thu, Feb 2, 2012 at 7:39 AM, Praveen Sripati <
> [EMAIL PROTECTED]
> >> >wrote:
> >>
> >> > > I have a simple MR job, and I want each Mapper to get one line from
> my
> >> > input file (which contains further instructions for lengthy
> processing).
> >> >
> >> > Use the NLineInputFormat class.
> >> >
> >> >
> >> >
> >>
> http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.html
> >> >
> >> > Praveen
> >> >
> >> > On Thu, Feb 2, 2012 at 9:43 AM, Mark Kerzner <
> [EMAIL PROTECTED]
> >> > >wrote:
> >> >
> >> > > Thanks!
> >> > > Mark
> >> > >
> >> > > On Wed, Feb 1, 2012 at 7:44 PM, Anil Gupta <[EMAIL PROTECTED]>
> >> > wrote:
> >> > >
> >> > > > Yes, if ur block size is 64mb. Btw, block size is configurable in
> >> > Hadoop.
> >> > > >
> >> > > > Best Regards,
> >> > > > Anil
> >> > > >
> >> > > > On Feb 1, 2012, at 5:06 PM, Mark Kerzner <
> [EMAIL PROTECTED]>
> >> > > wrote:
> >> > > >
> >> > > > > Anil,
> >> > > > >
> >> > > > > do you mean one block of HDFS, like 64MB?
> >> > > > >
> >> > > > > Mark
> >> > > > >
> >> > > > > On Wed, Feb 1, 2012 at 7:03 PM, Anil Gupta <
> [EMAIL PROTECTED]>
> >> > > > wrote:
> >> > > > >
> >> > > > >> Do u have enough data to start more than one mapper?
> >> > > > >> If entire data is less than a block size then only 1 mapper
> will
> >> > run.
> >> > > > >>
> >> > > > >> Best Regards,
> >> > > > >> Anil
> >> > > > >>
> >> > > > >> On Feb 1, 2012, at 4:21 PM, Mark Kerzner <
> >> [EMAIL PROTECTED]>
> >> > > > wrote:
> >> > > > >>
> >> > > > >>> Hi,
> >> > > > >>>
> >> > > > >>> I have a simple MR job, and I want each Mapper to get one line
> >> from
> >> > > my
> >> > > > >>> input file (which contains further instructions for lengthy
> >> > > > processing).
> >> > > > >>> Each line is 100 characters long, and I tell Hadoop to read
> only
> >> > 100
> >> > > > >> bytes,
> >> > > > >>>
> >> > > > >>>
> >> > > > >>
> >> > > >
> >> > >
> >> >
> >>
> job.getConfiguration().setInt("mapreduce.input.linerecordreader.line.maxlength",
> >> > > > >>> 100);
> >> > > > >>>
> >> > > > >>> I see that this part works - it reads only one line at a time,
> >> and
> >> > > if I
> >> > > > >>> change this parameter, it listens.
> >> > > > >>>
> >> > > > >>> However, on a cluster only one node receives all the map
> tasks.
> >> > Only
> >> > > > one
> >> > > > >>> map tasks is started. The others never get anything, they just
> >> > wait.
> >> > > > I've
> >> > > > >>> added 100 seconds wait to the mapper - no change!
> >> > > > >>>
> >> > > > >>> Any advice?
> >> > > > >>>
> >> > > > >>> Thank you. Sincerely,
> >> > > > >>> Mark
> >> > > > >>
> >> > > >
> >> > >
> >> >
> >>
>
>
>
> --
> Harsh J
> Customer Ops. Engineer
> Cloudera | http://tiny.cloudera.com/about
>