Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Problems Mapping multigigabyte file


Copy link to this message
-
Re: Problems Mapping multigigabyte file
Yes and I  presume that both the cases that succeed and fail are being split

On Fri, Oct 14, 2011 at 8:49 AM, Justin Woody <[EMAIL PROTECTED]>wrote:

> Steve,
>
> Is the input file splittable?
>
> Justin
>
> On Fri, Oct 14, 2011 at 11:23 AM, Steve Lewis <[EMAIL PROTECTED]>
> wrote:
> > I have an MR task which runs well with a single input file or an input
> > directory with dozens of 50MB input files.
> > When the data is in a single input file of 1 GB of more the mapper never
> > gets to 0%. There are not errors but when I look at the cluster, the CPUs
> > are spending huge amounts of time in a wait state. The job runs when the
> > input is 800MB and can complete even with a number of 500MB files as
> input.
> > The cluster (0.02) has 8 nodes - 8 cpu per node. Block size is 64MB.
> > Any bright ideas
> >
> > --
> > Steven M. Lewis PhD
> > 4221 105th Ave NE
> > Kirkland, WA 98033
> > 206-384-1340 (cell)
> > Skype lordjoe_com
> >
> >
> >
>

--
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB