|
|
-
Re: OutOfMemory during Plain Java MapReduceHarsh J 2013-03-08, 10:57
Hi,
When you implement code that starts memory-storing value copies for every record (even if of just a single key), things are going to break in big-data-land. Practically, post-partitioning, the # of values for a given key can be huge given the source data, so you cannot hold it all in and then write in one go. You'd probably need to write out something continuously if you really really want to do this, or use an alternative form of key-value storage where updates can be made incrementally (Apache HBase is such a store, as one example). This has been discussed before IIRC, and if the goal were to store the outputs onto a file then its better to just directly serialize them with a file opened instead of keeping it in a data structure and serializing it at the end. The caveats that'd apply if you were to open your own file from a task are described at http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F. On Fri, Mar 8, 2013 at 4:35 AM, Christian Schneider <[EMAIL PROTECTED]> wrote: > I had a look to the stacktrace and it says the problem is at the reducer: > userSet.add(iterator.next().toString()); > > Error: Java heap space > attempt_201303072200_0016_r_000002_0: WARN : mapreduce.Counters - Group > org.apache.hadoop.mapred.Task$Counter is deprecated. Use > org.apache.hadoop.mapreduce.TaskCounter instead > attempt_201303072200_0016_r_000002_0: WARN : > org.apache.hadoop.conf.Configuration - session.id is deprecated. Instead, > use dfs.metrics.session-id > attempt_201303072200_0016_r_000002_0: WARN : > org.apache.hadoop.conf.Configuration - slave.host.name is deprecated. > Instead, use dfs.datanode.hostname > attempt_201303072200_0016_r_000002_0: FATAL: org.apache.hadoop.mapred.Child > - Error running child : java.lang.OutOfMemoryError: Java heap space > attempt_201303072200_0016_r_000002_0: at > java.util.Arrays.copyOfRange(Arrays.java:3209) > attempt_201303072200_0016_r_000002_0: at > java.lang.String.<init>(String.java:215) > attempt_201303072200_0016_r_000002_0: at > java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542) > attempt_201303072200_0016_r_000002_0: at > java.nio.CharBuffer.toString(CharBuffer.java:1157) > attempt_201303072200_0016_r_000002_0: at > org.apache.hadoop.io.Text.decode(Text.java:394) > attempt_201303072200_0016_r_000002_0: at > org.apache.hadoop.io.Text.decode(Text.java:371) > attempt_201303072200_0016_r_000002_0: at > org.apache.hadoop.io.Text.toString(Text.java:273) > attempt_201303072200_0016_r_000002_0: at > com.myCompany.UserToAppReducer.reduce(RankingReducer.java:21) > attempt_201303072200_0016_r_000002_0: at > com.myCompany.UserToAppReducer.reduce(RankingReducer.java:1) > attempt_201303072200_0016_r_000002_0: at > org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164) > attempt_201303072200_0016_r_000002_0: at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610) > attempt_201303072200_0016_r_000002_0: at > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444) > attempt_201303072200_0016_r_000002_0: at > org.apache.hadoop.mapred.Child$4.run(Child.java:268) > attempt_201303072200_0016_r_000002_0: at > java.security.AccessController.doPrivileged(Native Method) > attempt_201303072200_0016_r_000002_0: at > javax.security.auth.Subject.doAs(Subject.java:396) > attempt_201303072200_0016_r_000002_0: at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) > attempt_201303072200_0016_r_000002_0: at > org.apache.hadoop.mapred.Child.main(Child.java:262) > > But how to solve this? > > > 2013/3/7 Christian Schneider <[EMAIL PROTECTED]> >> >> Hi, >> during the Reduce phase or afterwards (i don't really know how to debug >> it) I get a heap out of Memory Exception. >> >> I guess this is because the value of the reduce task (a Custom Writable) >> holds a List with a lot of user ids. >> The Setup is quite simple. This are the related classes I used: >> >> //---- Harsh J |