Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - how to figure out the range of a split that failed?


Copy link to this message
-
Re: how to figure out the range of a split that failed?
edward choi 2010-07-01, 05:15
Dear Sharad,

I have come across another problem. I hope you can help me with this too.
I am trying to use SkipBadRecords feature on Hadoop Streaming.
The streaming method I use is: "hadoop jar
$HADOOP_INSTALL/contrib/streaming/hadoop-*-streaming.jar"
But your example uses Java application which I cannot use because I am
trying to use a C++ application connecting it with Hadoop Streaming.

So what I am doing is:
hadoop jar $HADOOP_INSTALL/contrib/streaming/hadoop-*-streaming.jar -D
mapred.skip.mode.enabled=true -D mapred.skip.attempts.to.start.skipping=2 -D
mapred.skip.map.max.skip.records=Long.MAX_VALUE -D mapred.reduce.tasks=0
-file "..." -mapper "..." -input "..." -output "..."

Then I noticed that you have to set
"mapred.skip.map.auto.incr.proc.count=false" and increment
COUNTER_MAP_PROCESSED_RECORDS in your own application. I guess that you can
do this in your example, but I don't know how to do it using my way of
Hadoop Streaming. Could you enlighten me please?

Sincerely, Ed

2010/6/30 Sharad Agarwal <[EMAIL PROTECTED]>

> edward choi wrote:
>
>> Thanks for the quick response.
>> I know the SkipBadRecords feature but unfortunately I cannot use it since
>> I
>> am running my job on Hadoop Streaming.
>> I had asked if there were any way to use SkipBadRecords in Hadoop
>> Streaming
>> but never got an answer. I guess it is not possible at all.
>> Thanks for your concern.
>>
>>
> SkipBadRecords feature can be used for streaming as well. Perhaps the best
> example is the testcase
> ->
> http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/streaming/src/test/org/apache/hadoop/streaming/TestStreamingBadRecords.java?view=markup
>
> Sharad
>