Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Re: OutOfMemory during Plain Java MapReduce


Copy link to this message
-
Re: OutOfMemory during Plain Java MapReduce
Harsh J 2013-03-08, 10:57
Hi,

When you implement code that starts memory-storing value copies for
every record (even if of just a single key), things are going to break
in big-data-land. Practically, post-partitioning, the # of values for
a given key can be huge given the source data, so you cannot hold it
all in and then write in one go. You'd probably need to write out
something continuously if you really really want to do this, or use an
alternative form of key-value storage where updates can be made
incrementally (Apache HBase is such a store, as one example).

This has been discussed before IIRC, and if the goal were to store the
outputs onto a file then its better to just directly serialize them
with a file opened instead of keeping it in a data structure and
serializing it at the end. The caveats that'd apply if you were to
open your own file from a task are described at
http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F.

On Fri, Mar 8, 2013 at 4:35 AM, Christian Schneider
<[EMAIL PROTECTED]> wrote:
> I had a look to the stacktrace and it says the problem is at the reducer:
> userSet.add(iterator.next().toString());
>
> Error: Java heap space
> attempt_201303072200_0016_r_000002_0: WARN : mapreduce.Counters - Group
> org.apache.hadoop.mapred.Task$Counter is deprecated. Use
> org.apache.hadoop.mapreduce.TaskCounter instead
> attempt_201303072200_0016_r_000002_0: WARN :
> org.apache.hadoop.conf.Configuration - session.id is deprecated. Instead,
> use dfs.metrics.session-id
> attempt_201303072200_0016_r_000002_0: WARN :
> org.apache.hadoop.conf.Configuration - slave.host.name is deprecated.
> Instead, use dfs.datanode.hostname
> attempt_201303072200_0016_r_000002_0: FATAL: org.apache.hadoop.mapred.Child
> - Error running child : java.lang.OutOfMemoryError: Java heap space
> attempt_201303072200_0016_r_000002_0: at
> java.util.Arrays.copyOfRange(Arrays.java:3209)
> attempt_201303072200_0016_r_000002_0: at
> java.lang.String.<init>(String.java:215)
> attempt_201303072200_0016_r_000002_0: at
> java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
> attempt_201303072200_0016_r_000002_0: at
> java.nio.CharBuffer.toString(CharBuffer.java:1157)
> attempt_201303072200_0016_r_000002_0: at
> org.apache.hadoop.io.Text.decode(Text.java:394)
> attempt_201303072200_0016_r_000002_0: at
> org.apache.hadoop.io.Text.decode(Text.java:371)
> attempt_201303072200_0016_r_000002_0: at
> org.apache.hadoop.io.Text.toString(Text.java:273)
> attempt_201303072200_0016_r_000002_0: at
> com.myCompany.UserToAppReducer.reduce(RankingReducer.java:21)
> attempt_201303072200_0016_r_000002_0: at
> com.myCompany.UserToAppReducer.reduce(RankingReducer.java:1)
> attempt_201303072200_0016_r_000002_0: at
> org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
> attempt_201303072200_0016_r_000002_0: at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
> attempt_201303072200_0016_r_000002_0: at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
> attempt_201303072200_0016_r_000002_0: at
> org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> attempt_201303072200_0016_r_000002_0: at
> java.security.AccessController.doPrivileged(Native Method)
> attempt_201303072200_0016_r_000002_0: at
> javax.security.auth.Subject.doAs(Subject.java:396)
> attempt_201303072200_0016_r_000002_0: at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> attempt_201303072200_0016_r_000002_0: at
> org.apache.hadoop.mapred.Child.main(Child.java:262)
>
> But how to solve this?
>
>
> 2013/3/7 Christian Schneider <[EMAIL PROTECTED]>
>>
>> Hi,
>> during the Reduce phase or afterwards (i don't really know how to debug
>> it) I get a heap out of Memory Exception.
>>
>> I guess this is because the value of the reduce task (a Custom Writable)
>> holds a List with a lot of user ids.
>> The Setup is quite simple. This are the related classes I used:
>>
>> //----
Harsh J