Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Mapper Record Spillage


Copy link to this message
-
Mapper Record Spillage
I am attempting to speed up a mapping process whose input is GZIP compressed
CSV files. The files range from 1-2GB, I am running on a Cluster where each
node has a total of 32GB memory available to use. I have attempted to tweak
mapred.map.child.jvm.opts with -Xmx4096mb and io.sort.mb to 2048 to accommodate
the size but I keep getting java heap errors or other memory related
problems. My row count per mapper is well below Integer.MAX_INTEGER limit
by several orders of magnitude and the box is NOT using anywhere close to its
full memory allotment. How can I specify that this map task can have 3-4 GB
of memory for the collection, partition and sort process without constantly
spilling records to disk?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB