|
|
-
Re: OutOfMemory during Plain Java MapReduceHarsh J 2013-03-09, 00:55
Paul's way is much more easier than doing the serialization way I
mentioned earlier. I didn't pay attention to the logic used but just the implementation, my bad :) On Fri, Mar 8, 2013 at 5:39 PM, Paul Wilkinson <[EMAIL PROTECTED]> wrote: > As always, what Harsh said :) > > Looking at your reducer code, it appears that you are trying to compute the > distinct set of user IDs for a given reduce key. Rather than computing this > by holding the set in memory, use a secondary sort of the reduce values, > then while iterating over the reduce values, look for changes of user id. > Whenever it changes, write out the key and the newly found value. > > Your output will change from this: > > key, [value 1, value2, ... valueN] > > to this: > > key, value1 > key, value2 > ... > key, valueN > > Whether this is suitable for your follow-on processing is the next question, > but this approach will scale to whatever data you can throw at it. > > Paul > > > On 8 March 2013 10:57, Harsh J <[EMAIL PROTECTED]> wrote: >> >> Hi, >> >> When you implement code that starts memory-storing value copies for >> every record (even if of just a single key), things are going to break >> in big-data-land. Practically, post-partitioning, the # of values for >> a given key can be huge given the source data, so you cannot hold it >> all in and then write in one go. You'd probably need to write out >> something continuously if you really really want to do this, or use an >> alternative form of key-value storage where updates can be made >> incrementally (Apache HBase is such a store, as one example). >> >> This has been discussed before IIRC, and if the goal were to store the >> outputs onto a file then its better to just directly serialize them >> with a file opened instead of keeping it in a data structure and >> serializing it at the end. The caveats that'd apply if you were to >> open your own file from a task are described at >> >> http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F. >> >> On Fri, Mar 8, 2013 at 4:35 AM, Christian Schneider >> <[EMAIL PROTECTED]> wrote: >> > I had a look to the stacktrace and it says the problem is at the >> > reducer: >> > userSet.add(iterator.next().toString()); >> > >> > Error: Java heap space >> > attempt_201303072200_0016_r_000002_0: WARN : mapreduce.Counters - Group >> > org.apache.hadoop.mapred.Task$Counter is deprecated. Use >> > org.apache.hadoop.mapreduce.TaskCounter instead >> > attempt_201303072200_0016_r_000002_0: WARN : >> > org.apache.hadoop.conf.Configuration - session.id is deprecated. >> > Instead, >> > use dfs.metrics.session-id >> > attempt_201303072200_0016_r_000002_0: WARN : >> > org.apache.hadoop.conf.Configuration - slave.host.name is deprecated. >> > Instead, use dfs.datanode.hostname >> > attempt_201303072200_0016_r_000002_0: FATAL: >> > org.apache.hadoop.mapred.Child >> > - Error running child : java.lang.OutOfMemoryError: Java heap space >> > attempt_201303072200_0016_r_000002_0: at >> > java.util.Arrays.copyOfRange(Arrays.java:3209) >> > attempt_201303072200_0016_r_000002_0: at >> > java.lang.String.<init>(String.java:215) >> > attempt_201303072200_0016_r_000002_0: at >> > java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542) >> > attempt_201303072200_0016_r_000002_0: at >> > java.nio.CharBuffer.toString(CharBuffer.java:1157) >> > attempt_201303072200_0016_r_000002_0: at >> > org.apache.hadoop.io.Text.decode(Text.java:394) >> > attempt_201303072200_0016_r_000002_0: at >> > org.apache.hadoop.io.Text.decode(Text.java:371) >> > attempt_201303072200_0016_r_000002_0: at >> > org.apache.hadoop.io.Text.toString(Text.java:273) >> > attempt_201303072200_0016_r_000002_0: at >> > com.myCompany.UserToAppReducer.reduce(RankingReducer.java:21) >> > attempt_201303072200_0016_r_000002_0: at >> > com.myCompany.UserToAppReducer.reduce(RankingReducer.java:1) >> > attempt_201303072200_0016_r_000002_0: at >> > org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164) Harsh J |