Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Re: OutOfMemory during Plain Java MapReduce


Copy link to this message
-
Re: OutOfMemory during Plain Java MapReduce
Michael Segel 2013-03-08, 14:39
"A potential problem could be, that a reduce is going to write files >600MB and our mapred.child.java.opts is set to ~380MB."

Isn't the minimum heap normally 512MB?

Why not just increase your child heap size, assuming you have enough memory on the box...
On Mar 8, 2013, at 4:57 AM, Harsh J <[EMAIL PROTECTED]> wrote:

> Hi,
>
> When you implement code that starts memory-storing value copies for
> every record (even if of just a single key), things are going to break
> in big-data-land. Practically, post-partitioning, the # of values for
> a given key can be huge given the source data, so you cannot hold it
> all in and then write in one go. You'd probably need to write out
> something continuously if you really really want to do this, or use an
> alternative form of key-value storage where updates can be made
> incrementally (Apache HBase is such a store, as one example).
>
> This has been discussed before IIRC, and if the goal were to store the
> outputs onto a file then its better to just directly serialize them
> with a file opened instead of keeping it in a data structure and
> serializing it at the end. The caveats that'd apply if you were to
> open your own file from a task are described at
> http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F.
>
> On Fri, Mar 8, 2013 at 4:35 AM, Christian Schneider
> <[EMAIL PROTECTED]> wrote:
>> I had a look to the stacktrace and it says the problem is at the reducer:
>> userSet.add(iterator.next().toString());
>>
>> Error: Java heap space
>> attempt_201303072200_0016_r_000002_0: WARN : mapreduce.Counters - Group
>> org.apache.hadoop.mapred.Task$Counter is deprecated. Use
>> org.apache.hadoop.mapreduce.TaskCounter instead
>> attempt_201303072200_0016_r_000002_0: WARN :
>> org.apache.hadoop.conf.Configuration - session.id is deprecated. Instead,
>> use dfs.metrics.session-id
>> attempt_201303072200_0016_r_000002_0: WARN :
>> org.apache.hadoop.conf.Configuration - slave.host.name is deprecated.
>> Instead, use dfs.datanode.hostname
>> attempt_201303072200_0016_r_000002_0: FATAL: org.apache.hadoop.mapred.Child
>> - Error running child : java.lang.OutOfMemoryError: Java heap space
>> attempt_201303072200_0016_r_000002_0: at
>> java.util.Arrays.copyOfRange(Arrays.java:3209)
>> attempt_201303072200_0016_r_000002_0: at
>> java.lang.String.<init>(String.java:215)
>> attempt_201303072200_0016_r_000002_0: at
>> java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
>> attempt_201303072200_0016_r_000002_0: at
>> java.nio.CharBuffer.toString(CharBuffer.java:1157)
>> attempt_201303072200_0016_r_000002_0: at
>> org.apache.hadoop.io.Text.decode(Text.java:394)
>> attempt_201303072200_0016_r_000002_0: at
>> org.apache.hadoop.io.Text.decode(Text.java:371)
>> attempt_201303072200_0016_r_000002_0: at
>> org.apache.hadoop.io.Text.toString(Text.java:273)
>> attempt_201303072200_0016_r_000002_0: at
>> com.myCompany.UserToAppReducer.reduce(RankingReducer.java:21)
>> attempt_201303072200_0016_r_000002_0: at
>> com.myCompany.UserToAppReducer.reduce(RankingReducer.java:1)
>> attempt_201303072200_0016_r_000002_0: at
>> org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
>> attempt_201303072200_0016_r_000002_0: at
>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
>> attempt_201303072200_0016_r_000002_0: at
>> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
>> attempt_201303072200_0016_r_000002_0: at
>> org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>> attempt_201303072200_0016_r_000002_0: at
>> java.security.AccessController.doPrivileged(Native Method)
>> attempt_201303072200_0016_r_000002_0: at
>> javax.security.auth.Subject.doAs(Subject.java:396)
>> attempt_201303072200_0016_r_000002_0: at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> attempt_201303072200_0016_r_000002_0: at
>> org.apache.hadoop.mapred.Child.main(Child.java:262)
>>