Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Problems Mapping multigigabyte file


Copy link to this message
-
Problems Mapping multigigabyte file
I have an MR task which runs well with a single input file or an input
directory with dozens of 50MB input files.

When the data is in a single input file of 1 GB of more the mapper never
gets to 0%. There are not errors but when I look at the cluster, the CPUs
are spending huge amounts of time in a wait state. The job runs when the
input is 800MB and can complete even with a number of 500MB files as input.

The cluster (0.02) has 8 nodes - 8 cpu per node. Block size is 64MB.

Any bright ideas

--
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB