|
|
-
Why not making InputSplit implements interface Writable ?
Jeff Zhang 2010-02-06, 07:09
Hi all,
I look at the source code of Hadoop, and found that the InputSplit did not implements Writable. As my understanding, InputSplit will been transfered to each TT and then deserialized. So it should implement the Writable interface. And I check each implementation of InputSplit, actually all the sub-classes implement the Writable interface. So I think it would be better to to let the abstract class InputForamt implement the Writable, then users won't forget to implement the method write(DataOutput out) and readFields(DataInput in) if he wants to write a customized InputSplit. -- Best Regards
Jeff Zhang
-
Re: Why not making InputSplit implements interface Writable ?
Tom White 2010-02-06, 17:01
Hi Jeff,
InputSplit in the new MapReduce API (in the o.a.h.mapreduce package) does not implement Writable since splits can be serialized using any serialization framework - e.g. Java object serialization. You can see where splits are serialized at JobSplitWriter.writeNewSplits() and deserialized on the task node at MapTask.getSplitDetails(). This is in contrast to the old API which mandated that InputSplits had to be Writable.
Cheers, Tom
On Fri, Feb 5, 2010 at 11:09 PM, Jeff Zhang <[EMAIL PROTECTED]> wrote: > Hi all, > > I look at the source code of Hadoop, and found that the InputSplit did not > implements Writable. As my understanding, InputSplit will been transfered to > each TT and then deserialized. So it should implement the Writable > interface. And I check each implementation of InputSplit, actually all the > sub-classes implement the Writable interface. So I think it would be better > to to let the abstract class InputForamt implement the Writable, then users > won't forget to implement the method write(DataOutput out) and > readFields(DataInput in) if he wants to write a customized InputSplit. > > > -- > Best Regards > > Jeff Zhang >
-
Re: Why not making InputSplit implements interface Writable ?
Jeff Zhang 2010-02-07, 02:02
Tom,
Thanks for your help.
On Sat, Feb 6, 2010 at 9:01 AM, Tom White <[EMAIL PROTECTED]> wrote:
> Hi Jeff, > > InputSplit in the new MapReduce API (in the o.a.h.mapreduce package) > does not implement Writable since splits can be serialized using any > serialization framework - e.g. Java object serialization. You can see > where splits are serialized at JobSplitWriter.writeNewSplits() and > deserialized on the task node at MapTask.getSplitDetails(). This is in > contrast to the old API which mandated that InputSplits had to be > Writable. > > Cheers, > Tom > > On Fri, Feb 5, 2010 at 11:09 PM, Jeff Zhang <[EMAIL PROTECTED]> wrote: > > Hi all, > > > > I look at the source code of Hadoop, and found that the InputSplit did > not > > implements Writable. As my understanding, InputSplit will been transfered > to > > each TT and then deserialized. So it should implement the Writable > > interface. And I check each implementation of InputSplit, actually all > the > > sub-classes implement the Writable interface. So I think it would be > better > > to to let the abstract class InputForamt implement the Writable, then > users > > won't forget to implement the method write(DataOutput out) and > > readFields(DataInput in) if he wants to write a customized InputSplit. > > > > > > -- > > Best Regards > > > > Jeff Zhang > > >
-- Best Regards
Jeff Zhang
|
|