|
|
-
OutOfMemoryError of PIG job (UDF loads big file)
jiang licht 2010-02-23, 01:43
I am running a hadoop job written in PIG. It fails from out of memory because a UDF function consumes a lot of memory, it loads a big file. What are the settings to avoid the following OutOfMemoryError? I guess by simply giving PIG big memory (java -XmxBIGmemory org.apache.pig.Main ...) won't work.
Error message --->
java.lang.OutOfMemoryError: Java heap space at java.util.regex.Pattern.compile(Pattern.java:1451) at java.util.regex.Pattern.(Pattern.java:1133) at java.util.regex.Pattern.compile(Pattern.java:823) at java.lang.String.split(String.java:2293) at java.lang.String.split(String.java:2335) at UDF.load(Unknown Source) at UDF.load(Unknown Source) at UDF.exec(Unknown Source) at UDF.exec(Unknown Source) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:287) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:278) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child.main(Child.java:155)
Thanks! Michael
-
Re: OutOfMemoryError of PIG job (UDF loads big file)
Jeff Zhang 2010-02-23, 02:13
Hi Jiang,
you should set property *mapred.child.java.opts* in mapred-site.xml to increase the memeory as following:
<property> <name>mapred.child.java.opts</name> <value>-Xmx1024m</value> </property>
and then restart your hadoop cluster On Tue, Feb 23, 2010 at 9:43 AM, jiang licht <[EMAIL PROTECTED]> wrote:
> I am running a hadoop job written in PIG. It fails from out of memory > because a UDF function consumes a lot of memory, it loads a big file. What > are the settings to avoid the following OutOfMemoryError? I guess by simply > giving PIG big memory (java -XmxBIGmemory org.apache.pig.Main ...) won't > work. > > Error message ---> > > java.lang.OutOfMemoryError: Java heap space > at java.util.regex.Pattern.compile(Pattern.java:1451) > at java.util.regex.Pattern.(Pattern.java:1133) > at java.util.regex.Pattern.compile(Pattern.java:823) > at java.lang.String.split(String.java:2293) > at java.lang.String.split(String.java:2335) > at UDF.load(Unknown Source) > at UDF.load(Unknown Source) > at UDF.exec(Unknown Source) > at UDF.exec(Unknown Source) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:287) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:278) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) > at org.apache.hadoop.mapred.Child.main(Child.java:155) > > Thanks! > Michael > > > > -- Best Regards
Jeff Zhang
-
Re: OutOfMemoryError of PIG job (UDF loads big file)
jiang licht 2010-02-23, 02:36
Thanks Jeff. I also just found this one and solved my problem. BTW, so many settings to play with :) Michael
--- On Mon, 2/22/10, Jeff Zhang <[EMAIL PROTECTED]> wrote:
From: Jeff Zhang <[EMAIL PROTECTED]> Subject: Re: OutOfMemoryError of PIG job (UDF loads big file) To: [EMAIL PROTECTED] Date: Monday, February 22, 2010, 8:13 PM
Hi Jiang,
you should set property *mapred.child.java.opts* in mapred-site.xml to increase the memeory as following:
<property> <name>mapred.child.java.opts</name> <value>-Xmx1024m</value> </property>
and then restart your hadoop cluster On Tue, Feb 23, 2010 at 9:43 AM, jiang licht <[EMAIL PROTECTED]> wrote:
> I am running a hadoop job written in PIG. It fails from out of memory > because a UDF function consumes a lot of memory, it loads a big file. What > are the settings to avoid the following OutOfMemoryError? I guess by simply > giving PIG big memory (java -XmxBIGmemory org.apache.pig.Main ...) won't > work. > > Error message ---> > > java.lang.OutOfMemoryError: Java heap space > at java.util.regex.Pattern.compile(Pattern.java:1451) > at java.util.regex.Pattern.(Pattern.java:1133) > at java.util.regex.Pattern.compile(Pattern.java:823) > at java.lang.String.split(String.java:2293) > at java.lang.String.split(String.java:2335) > at UDF.load(Unknown Source) > at UDF.load(Unknown Source) > at UDF.exec(Unknown Source) > at UDF.exec(Unknown Source) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:287) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:278) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) > at org.apache.hadoop.mapred.Child.main(Child.java:155) > > Thanks! > Michael > > > > -- Best Regards
Jeff Zhang
-
Re: OutOfMemoryError of PIG job (UDF loads big file)
Ankur C. Goel 2010-02-23, 08:11
Yeah! Wait till you stumble across the need to adjust shuffle/reduce buffers, reuse JVMs, sort factor, copier threads ......... :-)
On 2/23/10 8:06 AM, "jiang licht" <[EMAIL PROTECTED]> wrote:
Thanks Jeff. I also just found this one and solved my problem. BTW, so many settings to play with :) Michael
--- On Mon, 2/22/10, Jeff Zhang <[EMAIL PROTECTED]> wrote:
From: Jeff Zhang <[EMAIL PROTECTED]> Subject: Re: OutOfMemoryError of PIG job (UDF loads big file) To: [EMAIL PROTECTED] Date: Monday, February 22, 2010, 8:13 PM
Hi Jiang,
you should set property *mapred.child.java.opts* in mapred-site.xml to increase the memeory as following:
<property> <name>mapred.child.java.opts</name> <value>-Xmx1024m</value> </property>
and then restart your hadoop cluster On Tue, Feb 23, 2010 at 9:43 AM, jiang licht <[EMAIL PROTECTED]> wrote:
> I am running a hadoop job written in PIG. It fails from out of memory > because a UDF function consumes a lot of memory, it loads a big file. What > are the settings to avoid the following OutOfMemoryError? I guess by simply > giving PIG big memory (java -XmxBIGmemory org.apache.pig.Main ...) won't > work. > > Error message ---> > > java.lang.OutOfMemoryError: Java heap space > at java.util.regex.Pattern.compile(Pattern.java:1451) > at java.util.regex.Pattern.(Pattern.java:1133) > at java.util.regex.Pattern.compile(Pattern.java:823) > at java.lang.String.split(String.java:2293) > at java.lang.String.split(String.java:2335) > at UDF.load(Unknown Source) > at UDF.load(Unknown Source) > at UDF.exec(Unknown Source) > at UDF.exec(Unknown Source) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:287) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:278) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) > at org.apache.hadoop.mapred.Child.main(Child.java:155) > > Thanks! > Michael > > > > -- Best Regards
Jeff Zhang
-
Re: OutOfMemoryError of PIG job (UDF loads big file)
jiang licht 2010-02-23, 17:30
Hm, Im wondering if there are some case studies regarding how ppl handle memory related issues posted somewhere as good references?
Thanks,
Michael
--- On Tue, 2/23/10, Ankur C. Goel <[EMAIL PROTECTED]> wrote:
From: Ankur C. Goel <[EMAIL PROTECTED]> Subject: Re: OutOfMemoryError of PIG job (UDF loads big file) To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Date: Tuesday, February 23, 2010, 2:11 AM
Yeah! Wait till you stumble across the need to adjust shuffle/reduce buffers, reuse JVMs, sort factor, copier threads ......... :-)
On 2/23/10 8:06 AM, "jiang licht" <[EMAIL PROTECTED]> wrote:
Thanks Jeff. I also just found this one and solved my problem. BTW, so many settings to play with :) Michael
--- On Mon, 2/22/10, Jeff Zhang <[EMAIL PROTECTED]> wrote:
From: Jeff Zhang <[EMAIL PROTECTED]> Subject: Re: OutOfMemoryError of PIG job (UDF loads big file) To: [EMAIL PROTECTED] Date: Monday, February 22, 2010, 8:13 PM
Hi Jiang,
you should set property *mapred.child.java.opts* in mapred-site.xml to increase the memeory as following:
<property> <name>mapred.child.java.opts</name> <value>-Xmx1024m</value> </property>
and then restart your hadoop cluster On Tue, Feb 23, 2010 at 9:43 AM, jiang licht <[EMAIL PROTECTED]> wrote:
> I am running a hadoop job written in PIG. It fails from out of memory > because a UDF function consumes a lot of memory, it loads a big file. What > are the settings to avoid the following OutOfMemoryError? I guess by simply > giving PIG big memory (java -XmxBIGmemory org.apache.pig.Main ...) won't > work. > > Error message ---> > > java.lang.OutOfMemoryError: Java heap space > at java.util.regex.Pattern.compile(Pattern.java:1451) > at java.util.regex.Pattern.(Pattern.java:1133) > at java.util.regex.Pattern.compile(Pattern.java:823) > at java.lang.String.split(String.java:2293) > at java.lang.String.split(String.java:2335) > at UDF.load(Unknown Source) > at UDF.load(Unknown Source) > at UDF.exec(Unknown Source) > at UDF.exec(Unknown Source) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:287) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:278) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) > at org.apache.hadoop.mapred.Child.main(Child.java:155) > > Thanks! > Michael > > > > -- Best Regards
Jeff Zhang
|
|