Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Re: All datanodes are bad IOException when trying to implement multithreading serialization


Copy link to this message
-
Re: All datanodes are bad IOException when trying to implement multithreading serialization
Sonal Goyal 2013-09-29, 23:59
Wouldn't you rather just change your split size so that you can have more mappers work on your input? What else are you doing in the mappers?
Sent from my iPad

On Sep 30, 2013, at 2:22 AM, yunming zhang <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I was playing with Hadoop code trying to have a single Mapper support reading a input split using multiple threads. I am getting All datanodes are bad IOException, and I am not sure what is the issue.
>
> The reason for this work is that I suspect my computation was slow because it takes too long to create the Text() objects from inputsplit using a single thread. I tried to modify the LineRecordReader (since I am mostly using TextInputFormat) to provide additional methods to retrieve lines from the input split  getCurrentKey2(), getCurrentValue2(), nextKeyValue2(). I created a second FSDataInputStream, and second LineReader object for getCurrentKey2(), getCurrentValue2() to read from. Essentially I am trying to open the input split twice with different start points (one in the very beginning, the other in the middle of the split) to read from input split in parallel using two threads.  
>
> In the org.apache.hadoop.mapreduce.mapper.run() method, I modified it to read simultaneously using getCurrentKey() and getCurrentKey2() using Thread 1 and Thread 2 (both threads running at the same tim
>       Thread 1:
>        while(context.nextKeyValue()){
>                   map(context.getCurrentKey(), context.getCurrentValue(), context);
>         }
>
>       Thread 2:
>         while(context.nextKeyValue2()){
>                 map(context.getCurrentKey2(), context.getCurrentValue2(), context);
>                 //System.out.println("two iter");
>         }
>
> However, this causes me to see the All Datanodes are bad exception. I think I made sure that I closed the second file. I have attached a copy of my LineRecordReader file to show what I changed trying to enable two simultaneous read to the input split.
>
> I have modified other files(org.apache.hadoop.mapreduce.RecordReader.java, mapred.MapTask.java ....)  just to enable Mapper.run to call LinRecordReader.getCurrentKey2() and other access methods for the second file.
>
>
> I would really appreciate it if anyone could give me a bit advice or just point me to a direction as to where the problem might be,
>
> Thanks
>
> Yunming
>
> <LineRecordReader.java>