Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Problems Mapping multigigabyte file


Copy link to this message
-
Re: Problems Mapping multigigabyte file
Steve Lewis 2011-10-14, 16:03
Yes and I  presume that both the cases that succeed and fail are being split

On Fri, Oct 14, 2011 at 8:49 AM, Justin Woody <[EMAIL PROTECTED]>wrote:

> Steve,
>
> Is the input file splittable?
>
> Justin
>
> On Fri, Oct 14, 2011 at 11:23 AM, Steve Lewis <[EMAIL PROTECTED]>
> wrote:
> > I have an MR task which runs well with a single input file or an input
> > directory with dozens of 50MB input files.
> > When the data is in a single input file of 1 GB of more the mapper never
> > gets to 0%. There are not errors but when I look at the cluster, the CPUs
> > are spending huge amounts of time in a wait state. The job runs when the
> > input is 800MB and can complete even with a number of 500MB files as
> input.
> > The cluster (0.02) has 8 nodes - 8 cpu per node. Block size is 64MB.
> > Any bright ideas
> >
> > --
> > Steven M. Lewis PhD
> > 4221 105th Ave NE
> > Kirkland, WA 98033
> > 206-384-1340 (cell)
> > Skype lordjoe_com
> >
> >
> >
>

--
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com