|
|
-
too many memory spillsPanshul Whisper 2013-03-06, 14:28
Hello,
I have a file of size 9GB and having approximately 109.5 million records. I execute a pig script on this file that is doing: 1. Group by on a field of the file 2. Count number of records in every group 3. Store the result in a CSV file using normal PigStorage(",") The job is completed successfully but the job details show a lot of memory spills. *Out of 109.5 million records, it shows approximately 48 million records spilled.* I am executing it on a* 4 node cluster each with a dual core processor and 4GB ram*. How can I minimize the amount of record spills. It really makes the execution really slow when the spilling starts. Any suggestions are welcome. Thanking You, -- Regards, Ouch Whisper 010101010101 |