Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Can't achieve load distribution


Copy link to this message
-
Re: Can't achieve load distribution
Praveen,

this seems just like the right thing, but it's API 0.21 (I googled about
the problems with it), so I have to use either the next Cloudera release,
or Hortonworks, or something, am I right?

Mark

On Thu, Feb 2, 2012 at 7:39 AM, Praveen Sripati <[EMAIL PROTECTED]>wrote:

> > I have a simple MR job, and I want each Mapper to get one line from my
> input file (which contains further instructions for lengthy processing).
>
> Use the NLineInputFormat class.
>
>
> http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.html
>
> Praveen
>
> On Thu, Feb 2, 2012 at 9:43 AM, Mark Kerzner <[EMAIL PROTECTED]
> >wrote:
>
> > Thanks!
> > Mark
> >
> > On Wed, Feb 1, 2012 at 7:44 PM, Anil Gupta <[EMAIL PROTECTED]>
> wrote:
> >
> > > Yes, if ur block size is 64mb. Btw, block size is configurable in
> Hadoop.
> > >
> > > Best Regards,
> > > Anil
> > >
> > > On Feb 1, 2012, at 5:06 PM, Mark Kerzner <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > Anil,
> > > >
> > > > do you mean one block of HDFS, like 64MB?
> > > >
> > > > Mark
> > > >
> > > > On Wed, Feb 1, 2012 at 7:03 PM, Anil Gupta <[EMAIL PROTECTED]>
> > > wrote:
> > > >
> > > >> Do u have enough data to start more than one mapper?
> > > >> If entire data is less than a block size then only 1 mapper will
> run.
> > > >>
> > > >> Best Regards,
> > > >> Anil
> > > >>
> > > >> On Feb 1, 2012, at 4:21 PM, Mark Kerzner <[EMAIL PROTECTED]>
> > > wrote:
> > > >>
> > > >>> Hi,
> > > >>>
> > > >>> I have a simple MR job, and I want each Mapper to get one line from
> > my
> > > >>> input file (which contains further instructions for lengthy
> > > processing).
> > > >>> Each line is 100 characters long, and I tell Hadoop to read only
> 100
> > > >> bytes,
> > > >>>
> > > >>>
> > > >>
> > >
> >
> job.getConfiguration().setInt("mapreduce.input.linerecordreader.line.maxlength",
> > > >>> 100);
> > > >>>
> > > >>> I see that this part works - it reads only one line at a time, and
> > if I
> > > >>> change this parameter, it listens.
> > > >>>
> > > >>> However, on a cluster only one node receives all the map tasks.
> Only
> > > one
> > > >>> map tasks is started. The others never get anything, they just
> wait.
> > > I've
> > > >>> added 100 seconds wait to the mapper - no change!
> > > >>>
> > > >>> Any advice?
> > > >>>
> > > >>> Thank you. Sincerely,
> > > >>> Mark
> > > >>
> > >
> >
>