Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> pig reduce OOM


Copy link to this message
-
pig reduce OOM
hi,
I wrote a pig script that one of the reduces always OOM no matter how I change the parallelism.
        Here's the script snippet:
Data = group SourceData all;
Result = foreach Data generate group, COUNt(SourceData);
store Result into 'XX';

  I analyzed the dumped java heap,  and find out that the reason is that the reducer load all the data for the foreach and count.

Can I re-implement the BinSedesTuple to avoid reducers load all the data for computation?

Here's the object domination tree:

here's the jmap result:

 
Haitao Yao
[EMAIL PROTECTED]
weibo: @haitao_yao
Skype:  haitao.yao.final

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB