Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Re: too many memory spills


Copy link to this message
-
Re: too many memory spills
Prashant Kommireddi 2013-03-07, 08:05
Are these spills happening on map or reduce side? What is the memory
allocated to each TaskTracker?

On Wed, Mar 6, 2013 at 6:28 AM, Panshul Whisper <[EMAIL PROTECTED]>wrote:

> Hello,
>
> I have a file of size 9GB and having approximately 109.5 million records.
> I execute a pig script on this file that is doing:
> 1. Group by on a field of the file
> 2. Count number of records in every group
> 3. Store the result in a CSV file using normal PigStorage(",")
>
> The job is completed successfully but the job details show a lot of memory
> spills. *Out of 109.5 million records, it shows approximately 48 million
> records spilled.*
>
> I am executing it on a* 4 node cluster each with a dual core processor
> and 4GB ram*.
>
> How can I minimize the amount of record spills. It really makes the
> execution really slow when the spilling starts.
>
> Any suggestions are welcome.
>
> Thanking You,
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>
+
Panshul Whisper 2013-03-07, 21:01
+
Prashant Kommireddi 2013-03-08, 00:25
+
Norbert Burger 2013-03-08, 02:47