Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Query Help


Copy link to this message
-
RE: Query Help
Santhosh Srinivasan 2009-02-25, 00:00
Tamar,

When PIG-646 was fixed, we did not see this behaviour. Can you file a
JIRA and provide a representative script that will produce this error?
If you can add more information regarding the size of your inputs, etc.,
it will aid us in reproducing the error.

Thanks,
Santhosh

-----Original Message-----
From: Tamir Kamara [mailto:[EMAIL PROTECTED]]
Sent: Saturday, February 21, 2009 2:38 AM
To: [EMAIL PROTECTED]
Subject: Re: Query Help

Hey,

I also seem to be having many map tasks being killed because no progress
is
reported. I think this is due to the DISTINCT UDF which in my case can
have
tens of millions of tuples to go through.

I'm seeing numerous errors like this one in the task logs:
2009-02-21 11:41:57,727 WARN
org.apache.pig.builtin.Distinct$Intermediate:
No reporter object provided to UDF
org.apache.pig.builtin.Distinct$Intermediate
Later the task could be killed because it failed to report status for
600
seconds.

It looks like this is a problem which has been resolved a few weeks ago
with
this:
https://issues.apache.org/jira/browse/PIG-646
and I'm working with the latest trunk.

Is this really the same issue or is it something else?

Thanks,
Tamir
On Fri, Feb 20, 2009 at 8:14 AM, Tamir Kamara <[EMAIL PROTECTED]>
wrote:

> Thanks Alan!
>
> I've tried to switch the files in the join statement and now the first
pig
> job responsible for the join succeeded. however, the second pig job
fail
> during its map phase soon after in starts because too many map tasks
fail/\/
> the error I'm getting for almost all tasks is Spill failed (more
details
> below). What does this mean when it happens in my map tasks in the
second
> pig job?
>
> Thanks in advance,
> Tamir
>
> java.io.IOException: Spill failed
> at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.ja
va:589)
> at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.ja
va:570)
>
> at java.io.DataOutputStream.writeBoolean(Unknown Source)
> at
org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.jav
a:82)
> at
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer
.serialize(WritableSerialization.java:90)
>
> at
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer
.serialize(WritableSerialization.java:77)
> at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:43
1)
> at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduc
e$Map.collect(PigMapReduce.java:100)
>
> at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.
runPipeline(PigMapBase.java:205)
> at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.
map(PigMapBase.java:194)
> at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduc
e$Map.map(PigMapReduce.java:85)
>
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)
> Caused by: java.lang.OutOfMemoryError: Java heap space
>
> at java.util.HashMap.resize(Unknown Source)
> at java.util.HashMap.addEntry(Unknown Source)
> at java.util.HashMap.put(Unknown Source)
> at java.util.HashSet.add(Unknown Source)
> at
org.apache.pig.data.DistinctDataBag.add(DistinctDataBag.java:104)
>
> at
org.apache.pig.builtin.Distinct.getDistinctFromNestedBags(Distinct.java:
127)
> at org.apache.pig.builtin.Distinct.access$200(Distinct.java:39)
> at
org.apache.pig.builtin.Distinct$Intermediate.exec(Distinct.java:102)
>
> at
org.apache.pig.builtin.Distinct$Intermediate.exec(Distinct.java:95)
> at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOp
erators.POUserFunc.getNext(POUserFunc.java:187)
> at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOp
erators.POUserFunc.getNext(POUserFunc.java:221)
>
> at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOp
erators.POForEach.processPlan(POForEach.java:248)
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOp
erators.POForEach.getNext(POForEach.java:198)
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOper
ator.processInput(PhysicalOperator.java:226)
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOp
erators.POLocalRearrange.getNext(POLocalRearrange.java:200)
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner
$Combine.processOnePackageOutput(PigCombiner.java:173)
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner
$Combine.reduce(PigCombiner.java:151)
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner
$Combine.reduce(PigCombiner.java:58)
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask
.java:904)
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.ja
va:785)
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1600(MapTask.jav
a:286)
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask
.java:712)
wrote:
records in
is
then
spill
discovered).  So
always
the
memory
outcome of
even
while the
those 3
<[EMAIL PROTECTED]
With
spair
a
which
GC
the
tell
memory
some
should
reducers are
200.
you
foreach.
of
obtained
below
AS
domain;