Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Large Bag (100GB of Data) in Reduce Step


Copy link to this message
-
Large Bag (100GB of Data) in Reduce Step
Hi Pig users,

I have a question regarding how to handle a large bag of data in reduce
step.
It happens that after I do the following (see below), each group has about
100GB of data to process. The bag is spilled continuously and the job is
very slow. What is your recommendation of speeding the processing when you
find yourself a large bag of data (over 100GB) to process?

A = LOAD '/tmp/data';
B = GROUP A by $0;
C = FOREACH B generate FLATTEN($1); -- this takes very very long because of
a large bag

Best Regards,

Jerry
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB