Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Can't achieve load distribution


Copy link to this message
-
Re: Can't achieve load distribution
Praveen,

this seems just like the right thing, but it's API 0.21 (I googled about
the problems with it), so I have to use either the next Cloudera release,
or Hortonworks, or something, am I right?

Mark

On Thu, Feb 2, 2012 at 7:39 AM, Praveen Sripati <[EMAIL PROTECTED]>wrote:

> > I have a simple MR job, and I want each Mapper to get one line from my
> input file (which contains further instructions for lengthy processing).
>
> Use the NLineInputFormat class.
>
>
> http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.html
>
> Praveen
>
> On Thu, Feb 2, 2012 at 9:43 AM, Mark Kerzner <[EMAIL PROTECTED]
> >wrote:
>
> > Thanks!
> > Mark
> >
> > On Wed, Feb 1, 2012 at 7:44 PM, Anil Gupta <[EMAIL PROTECTED]>
> wrote:
> >
> > > Yes, if ur block size is 64mb. Btw, block size is configurable in
> Hadoop.
> > >
> > > Best Regards,
> > > Anil
> > >
> > > On Feb 1, 2012, at 5:06 PM, Mark Kerzner <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > Anil,
> > > >
> > > > do you mean one block of HDFS, like 64MB?
> > > >
> > > > Mark
> > > >
> > > > On Wed, Feb 1, 2012 at 7:03 PM, Anil Gupta <[EMAIL PROTECTED]>
> > > wrote:
> > > >
> > > >> Do u have enough data to start more than one mapper?
> > > >> If entire data is less than a block size then only 1 mapper will
> run.
> > > >>
> > > >> Best Regards,
> > > >> Anil
> > > >>
> > > >> On Feb 1, 2012, at 4:21 PM, Mark Kerzner <[EMAIL PROTECTED]>
> > > wrote:
> > > >>
> > > >>> Hi,
> > > >>>
> > > >>> I have a simple MR job, and I want each Mapper to get one line from
> > my
> > > >>> input file (which contains further instructions for lengthy
> > > processing).
> > > >>> Each line is 100 characters long, and I tell Hadoop to read only
> 100
> > > >> bytes,
> > > >>>
> > > >>>
> > > >>
> > >
> >
> job.getConfiguration().setInt("mapreduce.input.linerecordreader.line.maxlength",
> > > >>> 100);
> > > >>>
> > > >>> I see that this part works - it reads only one line at a time, and
> > if I
> > > >>> change this parameter, it listens.
> > > >>>
> > > >>> However, on a cluster only one node receives all the map tasks.
> Only
> > > one
> > > >>> map tasks is started. The others never get anything, they just
> wait.
> > > I've
> > > >>> added 100 seconds wait to the mapper - no change!
> > > >>>
> > > >>> Any advice?
> > > >>>
> > > >>> Thank you. Sincerely,
> > > >>> Mark
> > > >>
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB