Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> how to figure out the range of a split that failed?


Copy link to this message
-
Re: how to figure out the range of a split that failed?
Hi,

> I am running a mapreduce job on my hadoop cluster.
>
> I am running a 10 gigabytes data and one tiny failed task crashes the whole
> operation.
> I am up to 98% complete and throwing away all the finished data seems just
> like an awful waste.
> I'd like to save the finished data and run again only the failed ones(the
> remaining 2%).
>
> Is there any way to figure out the range of the splits that failed?
> I go to "localhost:50030" to see if I can find any useful information but I
> must be looking at wrong places.

Can you check the 'Skip Bad records' feature mentioned here and see if
that helps: http://hadoop.apache.org/common/docs/r0.20.1/mapred_tutorial.html#Skipping+Bad+Records
?

Thanks
Hemanth

>
> Could somebody help me with this problem?
>
>
> Below is the log of a failed task. Any information I can use?
>
> *syslog logs*
>
> Records R/W=41707/41639
> 2010-06-30 07:35:30,530 INFO org.apache.hadoop.streaming.PipeMapRed:
> Records R/W=41776/41726
> 2010-06-30 07:35:40,554 INFO org.apache.hadoop.streaming.PipeMapRed:
> Records R/W=41865/41804
> 2010-06-30 07:35:50,559 INFO org.apache.hadoop.streaming.PipeMapRed:
> Records R/W=41970/41932
> 2010-06-30 07:36:00,637 INFO org.apache.hadoop.streaming.PipeMapRed:
> Records R/W=42073/42065
> 2010-06-30 07:36:10,772 INFO org.apache.hadoop.streaming.PipeMapRed:
> Records R/W=42258/42196
> 2010-06-30 07:36:20,785 INFO org.apache.hadoop.streaming.PipeMapRed:
> Records R/W=42318/42274
> 2010-06-30 07:36:30,985 INFO org.apache.hadoop.streaming.PipeMapRed:
> Records R/W=42378/42351
> 2010-06-30 07:36:41,005 INFO org.apache.hadoop.streaming.PipeMapRed:
> Records R/W=42442/42419
> 2010-06-30 07:36:51,149 INFO org.apache.hadoop.streaming.PipeMapRed:
> Records R/W=42499/42484
> 2010-06-30 07:37:01,235 INFO org.apache.hadoop.streaming.PipeMapRed:
> Records R/W=42559/42547
> 2010-06-30 07:37:11,242 INFO org.apache.hadoop.streaming.PipeMapRed:
> Records R/W=42626/42611
> 2010-06-30 07:37:21,485 INFO org.apache.hadoop.streaming.PipeMapRed:
> Records R/W=42769/42704
> 2010-06-30 07:37:31,617 INFO org.apache.hadoop.streaming.PipeMapRed:
> Records R/W=42845/42782
> 2010-06-30 07:37:41,725 INFO org.apache.hadoop.streaming.PipeMapRed:
> Records R/W=42915/42875
> 2010-06-30 07:37:51,733 INFO org.apache.hadoop.streaming.PipeMapRed:
> Records R/W=42986/42949
> 2010-06-30 07:38:01,795 INFO org.apache.hadoop.streaming.PipeMapRed:
> Records R/W=43070/43051
> 2010-06-30 07:38:11,849 INFO org.apache.hadoop.streaming.PipeMapRed:
> Records R/W=43138/43136
> 2010-06-30 07:38:22,398 INFO org.apache.hadoop.streaming.PipeMapRed:
> Records R/W=43258/43200
> 2010-06-30 07:38:31,642 INFO org.apache.hadoop.streaming.PipeMapRed:
> MRErrorThread done
> 2010-06-30 07:38:31,643 INFO org.apache.hadoop.streaming.PipeMapRed:
> MROutputThread done
> 2010-06-30 07:38:31,765 INFO org.apache.hadoop.streaming.PipeMapRed: log:null
> R/W/S=43335/43271/0 in:7=43335/5885 [rec/s] out:7=43271/5885 [rec/s]
> minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null
> HOST=null
> USER=hadoop
> HADOOP_USER=null
> last Hadoop input: |null|
> last tool output: |[B@d22860|
> Date: Wed Jun 30 07:38:31 KST 2010
> java.io.IOException: Broken pipe
>        at java.io.FileOutputStream.writeBytes(Native Method)
>        at java.io.FileOutputStream.write(FileOutputStream.java:260)
>        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>        at java.io.DataOutputStream.write(DataOutputStream.java:90)
>        at org.apache.hadoop.streaming.PipeMapRed.write(PipeMapRed.java:635)
>        at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:105)
>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
>        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB