Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Any reason a bunch of nearly-identical jobs would suddenly stop working?


Copy link to this message
-
Re: Any reason a bunch of nearly-identical jobs would suddenly stop working?
Question, do normal map-reduce jobs run on this cluster? Like the example jar jobs?
Guy

On Mar 9, 2011, at 2:29 PM, Kris Coward <[EMAIL PROTECTED]> wrote:

>
> Also, reading some uncompressed data off the same cluster using
> PigStorage shows a failure to even read the data in the first place :|
>
> -K
>
> On Tue, Mar 08, 2011 at 09:24:18PM -0500, Kris Coward wrote:
>>
>> None of the nodes have more than 20% utilization on any of their disks;
>> so it must be the cluster figuring that it can get away with this sort
>> of thing when the sysadmin's not around to set it straight.. clearly a
>> cluster of redundant/load-sharing sysadmins is also needed :)
>>
>> -K
>>
>> On Tue, Mar 08, 2011 at 03:24:50PM -0800, Dmitriy Ryaboy wrote:
>>> Check task logs. I am guessing you ran out of either hdfs or local disk on
>>> the nodes.
>>>
>>> Also, never let your sysadmin go on vacation, that's what makes things
>>> break! :)
>>>
>>> D
>>>
>>> On Tue, Mar 8, 2011 at 2:53 PM, Kris Coward <[EMAIL PROTECTED]> wrote:
>>>
>>>>
>>>> So I queued up a batch of jobs last night to run overnight (and into the
>>>> day a bit, owing to to a bottleneck on the scheduler the way that things
>>>> are currently implemented), made sure they were running correctly, went
>>>> to sleep, and when I woke up in the morning, they were failing all over
>>>> the place.
>>>>
>>>> Since each of these jobs was basicaly the same pig script being run with
>>>> a different set of parameters, I tried re-reunning it with the
>>>> parameters that it had run (successfully) with the night before, and it
>>>> also failed. So I started whittling away at steps to try and find the
>>>> origin of the failure, until I was even getting a failure loading the
>>>> initial data, and dumping it out. Basically, I've reduced things to a
>>>> matter of
>>>>
>>>> apa = LOAD
>>>> '/rawfiles/08556ecf5c6841d59eb702e9762e649a/{1296432000,1296435600,1296439200,1296442800,1296446400,1296450000,1296453600,1296457200,1296460800,1296464400,1296468000,1296471600,1296475200,1296478800,1296482400,1296486000,1296489600,1296493200,1296496800,1296500400,1296504000,1296507600,1296511200,1296514800}/*/apa'
>>>> USING com.twitter.elephantbird.pig.load.LzoTokenizedLoader(',') AS
>>>> (timestamp:long, type:chararray, appkey:chararray, uid:chararray,
>>>> uniq:chararray, shortUniq:chararray, profUid:chararray, addr:chararray,
>>>> ref:chararray);
>>>> dump apa;
>>>>
>>>> and after getting all the happy messages from the loader like:
>>>>
>>>> 2011-03-08 21:48:46,454 [Thread-12] INFO
>>>> com.twitter.elephantbird.pig.load.LzoBaseLoadFunc - Got 117 LZO slices in
>>>> total.
>>>> 2011-03-08 21:48:48,044 [main] INFO
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>> - 0% complete
>>>> 2011-03-08 21:50:17,612 [main] INFO
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>> - 100% complete
>>>>
>>>> It went straight to:
>>>>
>>>> 2011-03-08 21:50:17,612 [main] ERROR
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>> - 1 map reduce job(s) failed!
>>>> 2011-03-08 21:50:17,662 [main] ERROR
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>> - Failed to produce result in:
>>>> "hdfs://master.hadoop:9000/tmp/temp-2121884028/tmp-268519128"
>>>> 2011-03-08 21:50:17,664 [main] INFO
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>> - Failed!
>>>> 2011-03-08 21:50:17,668 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>>>> ERROR 1066: Unable to open iterator for alias apa
>>>> Details at logfile: /home/kris/pig_1299620898192.log
>>>>
>>>> And looking at the stack trace in the logfile, I've got:
>>>>
>>>> Pig Stack Trace
>>>> ---------------
>>>> ERROR 1066: Unable to open iterator for alias apa
>>>>
>>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
>>>> open iterator for alias apa
>>>>       at org.apache.pig.PigServer.openIterator(PigServer.java:482)