Pig, mail # user - Re: java.lang.OutOfMemoryError: Java heap space - 2014-02-07, 12:37
 Search Hadoop and all its subprojects:
Copy link to this message
-
Re: java.lang.OutOfMemoryError: Java heap space
Thanks Park for sharing the above configs

But I am wondering if the above config changes would make any huge
difference in my case.
As per my logs, I am very worried about this line -

 INFO org.apache.hadoop.mapred.MapTask: Record too large for in-memory
buffer: 644245358 bytes

If I am understanding it properly, my 1 record is very large to fit
into the memory, which is causing the issue.
Any of the above changes wouldn't make any huge impact, please correct
me if I am taking it totally wrong.

 - Adding hadoop user group here as well, to throw some valuable
inputs to understand the above question.
Since I am doing a join on a grouped bag, do you think that might be the case ?

But if that is the issue, as far as I understand Bags in Pig are
spillable, it shouldn't have given this issue.

I can't get rid of group by, Grouping by first should idealing improve
my join. But if this is the root cause, if I am understanding it
correctly,

do you think I should get rid of group-by.

But my question in that case would be what would happen if I do group
by later after join, if will result in much bigger bag (because it
would have more records after join)

Am I thinking here correctly ?

Regards

Prav

On Fri, Feb 7, 2014 at 3:11 AM, Cheolsoo Park <[EMAIL PROTECTED]> wrote:
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB