|
|
-
Problems Mapping multigigabyte file
Steve Lewis 2011-10-14, 15:23
I have an MR task which runs well with a single input file or an input directory with dozens of 50MB input files.
When the data is in a single input file of 1 GB of more the mapper never gets to 0%. There are not errors but when I look at the cluster, the CPUs are spending huge amounts of time in a wait state. The job runs when the input is 800MB and can complete even with a number of 500MB files as input.
The cluster (0.02) has 8 nodes - 8 cpu per node. Block size is 64MB.
Any bright ideas
-- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com
-
Re: Problems Mapping multigigabyte file
Justin Woody 2011-10-14, 15:49
Steve,
Is the input file splittable?
Justin
On Fri, Oct 14, 2011 at 11:23 AM, Steve Lewis <[EMAIL PROTECTED]> wrote: > I have an MR task which runs well with a single input file or an input > directory with dozens of 50MB input files. > When the data is in a single input file of 1 GB of more the mapper never > gets to 0%. There are not errors but when I look at the cluster, the CPUs > are spending huge amounts of time in a wait state. The job runs when the > input is 800MB and can complete even with a number of 500MB files as input. > The cluster (0.02) has 8 nodes - 8 cpu per node. Block size is 64MB. > Any bright ideas > > -- > Steven M. Lewis PhD > 4221 105th Ave NE > Kirkland, WA 98033 > 206-384-1340 (cell) > Skype lordjoe_com > > >
-
Re: Problems Mapping multigigabyte file
Steve Lewis 2011-10-14, 16:03
Yes and I presume that both the cases that succeed and fail are being split
On Fri, Oct 14, 2011 at 8:49 AM, Justin Woody <[EMAIL PROTECTED]>wrote:
> Steve, > > Is the input file splittable? > > Justin > > On Fri, Oct 14, 2011 at 11:23 AM, Steve Lewis <[EMAIL PROTECTED]> > wrote: > > I have an MR task which runs well with a single input file or an input > > directory with dozens of 50MB input files. > > When the data is in a single input file of 1 GB of more the mapper never > > gets to 0%. There are not errors but when I look at the cluster, the CPUs > > are spending huge amounts of time in a wait state. The job runs when the > > input is 800MB and can complete even with a number of 500MB files as > input. > > The cluster (0.02) has 8 nodes - 8 cpu per node. Block size is 64MB. > > Any bright ideas > > > > -- > > Steven M. Lewis PhD > > 4221 105th Ave NE > > Kirkland, WA 98033 > > 206-384-1340 (cell) > > Skype lordjoe_com > > > > > > >
-- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext