Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> how to figure out the range of a split that failed?


Copy link to this message
-
Re: how to figure out the range of a split that failed?
Dear Sharad,

I have come across another problem. I hope you can help me with this too.
I am trying to use SkipBadRecords feature on Hadoop Streaming.
The streaming method I use is: "hadoop jar
$HADOOP_INSTALL/contrib/streaming/hadoop-*-streaming.jar"
But your example uses Java application which I cannot use because I am
trying to use a C++ application connecting it with Hadoop Streaming.

So what I am doing is:
hadoop jar $HADOOP_INSTALL/contrib/streaming/hadoop-*-streaming.jar -D
mapred.skip.mode.enabled=true -D mapred.skip.attempts.to.start.skipping=2 -D
mapred.skip.map.max.skip.records=Long.MAX_VALUE -D mapred.reduce.tasks=0
-file "..." -mapper "..." -input "..." -output "..."

Then I noticed that you have to set
"mapred.skip.map.auto.incr.proc.count=false" and increment
COUNTER_MAP_PROCESSED_RECORDS in your own application. I guess that you can
do this in your example, but I don't know how to do it using my way of
Hadoop Streaming. Could you enlighten me please?

Sincerely, Ed

2010/6/30 Sharad Agarwal <[EMAIL PROTECTED]>

> edward choi wrote:
>
>> Thanks for the quick response.
>> I know the SkipBadRecords feature but unfortunately I cannot use it since
>> I
>> am running my job on Hadoop Streaming.
>> I had asked if there were any way to use SkipBadRecords in Hadoop
>> Streaming
>> but never got an answer. I guess it is not possible at all.
>> Thanks for your concern.
>>
>>
> SkipBadRecords feature can be used for streaming as well. Perhaps the best
> example is the testcase
> ->
> http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/streaming/src/test/org/apache/hadoop/streaming/TestStreamingBadRecords.java?view=markup
>
> Sharad
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB