Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Mapper Record Spillage


+
Hans Uhlig 2012-03-11, 04:00
+
WangRamon 2012-03-11, 04:05
Copy link to this message
-
Re: Mapper Record Spillage
I am attempting to specify this for a single job during its
creation/submission. Not via the general construct. I am using the new api
so I am adding the values to the conf passed into new Job();

2012/3/10 WangRamon <[EMAIL PROTECTED]>

>  How man map/reduce tasks slots do you have for each node? If the
> total number is 10, then you will use 10 * 4096mb memory when all tasks are
> running, which is bigger than the total memory 32G you have for each node.
>
> ------------------------------
> Date: Sat, 10 Mar 2012 20:00:13 -0800
> Subject: Mapper Record Spillage
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
>
> I am attempting to speed up a mapping process whose input is GZIP compressed
> CSV files. The files range from 1-2GB, I am running on a Cluster where each
> node has a total of 32GB memory available to use. I have attempted to tweak
> mapred.map.child.jvm.opts with -Xmx4096mb and io.sort.mb to 2048 to accommodate
> the size but I keep getting java heap errors or other memory related
> problems. My row count per mapper is well below Integer.MAX_INTEGER limi t
> by several orders of magnitude and the box is NOT using anywhere close to its
> full memory allotment. How can I specify that this map task can have 3-4
> GB of memory for the collection, partition and sort process without constantly
> spilling records to disk?
>
+
Harsh J 2012-03-11, 04:41
+
Hans Uhlig 2012-03-11, 05:54
+
Harsh J 2012-03-11, 07:50
+
Hans Uhlig 2012-03-11, 08:06
+
Harsh J 2012-03-11, 13:38
+
Harsh J 2012-03-11, 13:39
+
George Datskos 2012-03-13, 02:02
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB