Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Mapper Record Spillage


Copy link to this message
-
Re: Mapper Record Spillage
If that is the case then these two lines should make more than enough
memory. On a virtually unused cluster.

job.getConfiguration().setInt("io.sort.mb", 2048);
job.getConfiguration().set("mapred.map.child.java.opts", "-Xmx3072M");

Such that a conversion from 1GB of CSV Text to binary primitives should fit
easily. but java still throws a heap error even when there is 25 GB of
memory free.

On Sat, Mar 10, 2012 at 11:50 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Hans,
>
> You can change memory requirements for tasks of a single job, but not
> of a single task inside that job.
>
> This is briefly how the 0.20 framework (by default) works: TT has
> notions only of "slots", and carries a maximum _number_ of
> simultaneous slots it may run. It does not know of what each task,
> occupying one slot, would demand in resource-terms. Your job then
> supplies a # of map tasks, and amount of memory required per map task
> in general, as a configuration. TTs then merely start the task JVMs
> with the provided heap configuration.
>
> On Sun, Mar 11, 2012 at 11:24 AM, Hans Uhlig <[EMAIL PROTECTED]> wrote:
> > That was a typo in my email not in the configuration. Is the memory
> reserved
> > for the tasks when the task tracker starts? You seem to be suggesting
> that I
> > need to set the memory to be the same for all map tasks. Is there no way
> to
> > override for a single map task?
> >
> >
> > On Sat, Mar 10, 2012 at 8:41 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> >>
> >> Hans,
> >>
> >> Its possible you may have an typo issue: mapred.map.child.jvm.opts -
> >> Such a property does not exist. Perhaps you wanted
> >> "mapred.map.child.java.opts"?
> >>
> >> Additionally, the computation you need to do is (# of map slots on a
> >> TT * per-map-task-heap-requirement) should be at least < (Total RAM -
> >> 2/3 GB). With your 4 GB requirement, I guess you can support a max of
> >> 6-7 slots per machine (i.e. Not counting reducer heap requirements in
> >> parallel).
> >>
> >> On Sun, Mar 11, 2012 at 9:30 AM, Hans Uhlig <[EMAIL PROTECTED]> wrote:
> >> > I am attempting to speed up a mapping process whose input is GZIP
> >> > compressed
> >> > CSV files. The files range from 1-2GB, I am running on a Cluster where
> >> > each
> >> > node has a total of 32GB memory available to use. I have attempted to
> >> > tweak
> >> > mapred.map.child.jvm.opts with -Xmx4096mb and io.sort.mb to 2048 to
> >> > accommodate the size but I keep getting java heap errors or other
> memory
> >> > related problems. My row count per mapper is well below
> >> > Integer.MAX_INTEGER
> >> > limit by several orders of magnitude and the box is NOT using anywhere
> >> > close
> >> > to its full memory allotment. How can I specify that this map task can
> >> > have
> >> > 3-4 GB of memory for the collection, partition and sort process
> without
> >> > constantly spilling records to disk?
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB