Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Mapper Record Spillage


+
Hans Uhlig 2012-03-11, 04:00
+
WangRamon 2012-03-11, 04:05
+
Hans Uhlig 2012-03-11, 04:08
+
Harsh J 2012-03-11, 04:41
Copy link to this message
-
Re: Mapper Record Spillage
That was a typo in my email not in the configuration. Is the memory
reserved for the tasks when the task tracker starts? You seem to be
suggesting that I need to set the memory to be the same for all map tasks.
Is there no way to override for a single map task?

On Sat, Mar 10, 2012 at 8:41 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Hans,
>
> Its possible you may have an typo issue: mapred.map.child.jvm.opts -
> Such a property does not exist. Perhaps you wanted
> "mapred.map.child.java.opts"?
>
> Additionally, the computation you need to do is (# of map slots on a
> TT * per-map-task-heap-requirement) should be at least < (Total RAM -
> 2/3 GB). With your 4 GB requirement, I guess you can support a max of
> 6-7 slots per machine (i.e. Not counting reducer heap requirements in
> parallel).
>
> On Sun, Mar 11, 2012 at 9:30 AM, Hans Uhlig <[EMAIL PROTECTED]> wrote:
> > I am attempting to speed up a mapping process whose input is GZIP
> compressed
> > CSV files. The files range from 1-2GB, I am running on a Cluster where
> each
> > node has a total of 32GB memory available to use. I have attempted to
> tweak
> > mapred.map.child.jvm.opts with -Xmx4096mb and io.sort.mb to 2048 to
> > accommodate the size but I keep getting java heap errors or other memory
> > related problems. My row count per mapper is well below
> Integer.MAX_INTEGER
> > limit by several orders of magnitude and the box is NOT using anywhere
> close
> > to its full memory allotment. How can I specify that this map task can
> have
> > 3-4 GB of memory for the collection, partition and sort process without
> > constantly spilling records to disk?
>
>
>
> --
> Harsh J
>
+
Harsh J 2012-03-11, 07:50
+
Hans Uhlig 2012-03-11, 08:06
+
Harsh J 2012-03-11, 13:38
+
Harsh J 2012-03-11, 13:39
+
George Datskos 2012-03-13, 02:02
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB