Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Any reason a bunch of nearly-identical jobs would suddenly stop working?


Copy link to this message
-
Any reason a bunch of nearly-identical jobs would suddenly stop working?

So I queued up a batch of jobs last night to run overnight (and into the
day a bit, owing to to a bottleneck on the scheduler the way that things
are currently implemented), made sure they were running correctly, went
to sleep, and when I woke up in the morning, they were failing all over
the place.

Since each of these jobs was basicaly the same pig script being run with
a different set of parameters, I tried re-reunning it with the
parameters that it had run (successfully) with the night before, and it
also failed. So I started whittling away at steps to try and find the
origin of the failure, until I was even getting a failure loading the
initial data, and dumping it out. Basically, I've reduced things to a
matter of

apa = LOAD '/rawfiles/08556ecf5c6841d59eb702e9762e649a/{1296432000,1296435600,1296439200,1296442800,1296446400,1296450000,1296453600,1296457200,1296460800,1296464400,1296468000,1296471600,1296475200,1296478800,1296482400,1296486000,1296489600,1296493200,1296496800,1296500400,1296504000,1296507600,1296511200,1296514800}/*/apa' USING com.twitter.elephantbird.pig.load.LzoTokenizedLoader(',') AS (timestamp:long, type:chararray, appkey:chararray, uid:chararray, uniq:chararray, shortUniq:chararray, profUid:chararray, addr:chararray, ref:chararray);
dump apa;

and after getting all the happy messages from the loader like:

2011-03-08 21:48:46,454 [Thread-12] INFO com.twitter.elephantbird.pig.load.LzoBaseLoadFunc - Got 117 LZO slices in total.
2011-03-08 21:48:48,044 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2011-03-08 21:50:17,612 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete

It went straight to:

2011-03-08 21:50:17,612 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed!
2011-03-08 21:50:17,662 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: "hdfs://master.hadoop:9000/tmp/temp-2121884028/tmp-268519128"
2011-03-08 21:50:17,664 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2011-03-08 21:50:17,668 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias apa
Details at logfile: /home/kris/pig_1299620898192.log

And looking at the stack trace in the logfile, I've got:

Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias apa

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias apa
        at org.apache.pig.PigServer.openIterator(PigServer.java:482)
        at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
        at org.apache.pig.Main.main(Main.java:352)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
        at org.apache.pig.PigServer.openIterator(PigServer.java:476)
        ... 6 more
===============================================================================
My sysadmin's off on vacation for the week, but left information on the
scripts to restart the cluster, so I tried that, and the problem is
still persisting, so I was hoping someone here might have an idea what's
wrong (and how to fix it).

Thanks,
Kris

--
Kris Coward http://unripe.melon.org/
GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB