Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> too many memory spills


Copy link to this message
-
too many memory spills
Hello,

I have a file of size 9GB and having approximately 109.5 million records.
I execute a pig script on this file that is doing:
1. Group by on a field of the file
2. Count number of records in every group
3. Store the result in a CSV file using normal PigStorage(",")

The job is completed successfully but the job details show a lot of memory
spills. *Out of 109.5 million records, it shows approximately 48 million
records spilled.*

I am executing it on a* 4 node cluster each with a dual core processor and
4GB ram*.

How can I minimize the amount of record spills. It really makes the
execution really slow when the spilling starts.

Any suggestions are welcome.

Thanking You,

--
Regards,
Ouch Whisper
010101010101
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB