Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - OutOfMemory during Plain Java MapReduce


Copy link to this message
-
Re: OutOfMemory during Plain Java MapReduce
Harsh J 2013-03-09, 00:55
Paul's way is much more easier than doing the serialization way I
mentioned earlier. I didn't pay attention to the logic used but just
the implementation, my bad :)

On Fri, Mar 8, 2013 at 5:39 PM, Paul Wilkinson <[EMAIL PROTECTED]> wrote:
> As always, what Harsh said :)
>
> Looking at your reducer code, it appears that you are trying to compute the
> distinct set of user IDs for a given reduce key. Rather than computing this
> by holding the set in memory, use a secondary sort of the reduce values,
> then while iterating over the reduce values, look for changes of user id.
> Whenever it changes, write out the key and the newly found value.
>
> Your output will change from this:
>
>   key, [value 1, value2, ... valueN]
>
> to this:
>
>   key, value1
>   key, value2
>        ...
>   key, valueN
>
> Whether this is suitable for your follow-on processing is the next question,
> but this approach will scale to whatever data you can throw at it.
>
> Paul
>
>
> On 8 March 2013 10:57, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>
>> When you implement code that starts memory-storing value copies for
>> every record (even if of just a single key), things are going to break
>> in big-data-land. Practically, post-partitioning, the # of values for
>> a given key can be huge given the source data, so you cannot hold it
>> all in and then write in one go. You'd probably need to write out
>> something continuously if you really really want to do this, or use an
>> alternative form of key-value storage where updates can be made
>> incrementally (Apache HBase is such a store, as one example).
>>
>> This has been discussed before IIRC, and if the goal were to store the
>> outputs onto a file then its better to just directly serialize them
>> with a file opened instead of keeping it in a data structure and
>> serializing it at the end. The caveats that'd apply if you were to
>> open your own file from a task are described at
>>
>> http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F.
>>
>> On Fri, Mar 8, 2013 at 4:35 AM, Christian Schneider
>> <[EMAIL PROTECTED]> wrote:
>> > I had a look to the stacktrace and it says the problem is at the
>> > reducer:
>> > userSet.add(iterator.next().toString());
>> >
>> > Error: Java heap space
>> > attempt_201303072200_0016_r_000002_0: WARN : mapreduce.Counters - Group
>> > org.apache.hadoop.mapred.Task$Counter is deprecated. Use
>> > org.apache.hadoop.mapreduce.TaskCounter instead
>> > attempt_201303072200_0016_r_000002_0: WARN :
>> > org.apache.hadoop.conf.Configuration - session.id is deprecated.
>> > Instead,
>> > use dfs.metrics.session-id
>> > attempt_201303072200_0016_r_000002_0: WARN :
>> > org.apache.hadoop.conf.Configuration - slave.host.name is deprecated.
>> > Instead, use dfs.datanode.hostname
>> > attempt_201303072200_0016_r_000002_0: FATAL:
>> > org.apache.hadoop.mapred.Child
>> > - Error running child : java.lang.OutOfMemoryError: Java heap space
>> > attempt_201303072200_0016_r_000002_0: at
>> > java.util.Arrays.copyOfRange(Arrays.java:3209)
>> > attempt_201303072200_0016_r_000002_0: at
>> > java.lang.String.<init>(String.java:215)
>> > attempt_201303072200_0016_r_000002_0: at
>> > java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
>> > attempt_201303072200_0016_r_000002_0: at
>> > java.nio.CharBuffer.toString(CharBuffer.java:1157)
>> > attempt_201303072200_0016_r_000002_0: at
>> > org.apache.hadoop.io.Text.decode(Text.java:394)
>> > attempt_201303072200_0016_r_000002_0: at
>> > org.apache.hadoop.io.Text.decode(Text.java:371)
>> > attempt_201303072200_0016_r_000002_0: at
>> > org.apache.hadoop.io.Text.toString(Text.java:273)
>> > attempt_201303072200_0016_r_000002_0: at
>> > com.myCompany.UserToAppReducer.reduce(RankingReducer.java:21)
>> > attempt_201303072200_0016_r_000002_0: at
>> > com.myCompany.UserToAppReducer.reduce(RankingReducer.java:1)
>> > attempt_201303072200_0016_r_000002_0: at
>> > org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)

Harsh J