Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Mapper Record Spillage


Copy link to this message
-
Re: Mapper Record Spillage
I am attempting to specify this for a single job during its
creation/submission. Not via the general construct. I am using the new api
so I am adding the values to the conf passed into new Job();

2012/3/10 WangRamon <[EMAIL PROTECTED]>

>  How man map/reduce tasks slots do you have for each node? If the
> total number is 10, then you will use 10 * 4096mb memory when all tasks are
> running, which is bigger than the total memory 32G you have for each node.
>
> ------------------------------
> Date: Sat, 10 Mar 2012 20:00:13 -0800
> Subject: Mapper Record Spillage
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
>
> I am attempting to speed up a mapping process whose input is GZIP compressed
> CSV files. The files range from 1-2GB, I am running on a Cluster where each
> node has a total of 32GB memory available to use. I have attempted to tweak
> mapred.map.child.jvm.opts with -Xmx4096mb and io.sort.mb to 2048 to accommodate
> the size but I keep getting java heap errors or other memory related
> problems. My row count per mapper is well below Integer.MAX_INTEGER limi t
> by several orders of magnitude and the box is NOT using anywhere close to its
> full memory allotment. How can I specify that this map task can have 3-4
> GB of memory for the collection, partition and sort process without constantly
> spilling records to disk?
>