Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Mapper Record Spillage


Copy link to this message
-
Re: Mapper Record Spillage
That was a typo in my email not in the configuration. Is the memory
reserved for the tasks when the task tracker starts? You seem to be
suggesting that I need to set the memory to be the same for all map tasks.
Is there no way to override for a single map task?

On Sat, Mar 10, 2012 at 8:41 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Hans,
>
> Its possible you may have an typo issue: mapred.map.child.jvm.opts -
> Such a property does not exist. Perhaps you wanted
> "mapred.map.child.java.opts"?
>
> Additionally, the computation you need to do is (# of map slots on a
> TT * per-map-task-heap-requirement) should be at least < (Total RAM -
> 2/3 GB). With your 4 GB requirement, I guess you can support a max of
> 6-7 slots per machine (i.e. Not counting reducer heap requirements in
> parallel).
>
> On Sun, Mar 11, 2012 at 9:30 AM, Hans Uhlig <[EMAIL PROTECTED]> wrote:
> > I am attempting to speed up a mapping process whose input is GZIP
> compressed
> > CSV files. The files range from 1-2GB, I am running on a Cluster where
> each
> > node has a total of 32GB memory available to use. I have attempted to
> tweak
> > mapred.map.child.jvm.opts with -Xmx4096mb and io.sort.mb to 2048 to
> > accommodate the size but I keep getting java heap errors or other memory
> > related problems. My row count per mapper is well below
> Integer.MAX_INTEGER
> > limit by several orders of magnitude and the box is NOT using anywhere
> close
> > to its full memory allotment. How can I specify that this map task can
> have
> > 3-4 GB of memory for the collection, partition and sort process without
> > constantly spilling records to disk?
>
>
>
> --
> Harsh J
>