Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - pig reduce OOM


Copy link to this message
-
pig reduce OOM
Haitao Yao 2012-07-06, 06:44
hi,
I wrote a pig script that one of the reduces always OOM no matter how I change the parallelism.
        Here's the script snippet:
Data = group SourceData all;
Result = foreach Data generate group, COUNt(SourceData);
store Result into 'XX';

  I analyzed the dumped java heap,  and find out that the reason is that the reducer load all the data for the foreach and count.

Can I re-implement the BinSedesTuple to avoid reducers load all the data for computation?

Here's the object domination tree:

here's the jmap result:

 
Haitao Yao
[EMAIL PROTECTED]
weibo: @haitao_yao
Skype:  haitao.yao.final