|
|
Mark question 2011-06-11, 21:55
Hi,
1) Where can I find the "main" class of hadoop? The one that calls the InputFormat then the MapperRunner and ReducerRunner and others?
This will help me understand what is in memory or still on disk , exact flow of data between split and mappers .
My problem is, assuming I have a TextInputFormat and would like to modify the input in memory before being read by RecordReader... where shall I do that?
InputFormat was my first guess, but unfortunately, it only defines the logical splits ... So, the only way I can think of is use the recordReader to read all the records in split into another variable (with the format I want) then process that variable by map functions.
But is that efficient? So, to understand this,I hope someone can give an answer to Q(1)
Thank you, Mark
Harsh J 2011-06-12, 04:42
Mark,
I may not have gotten your question exactly, but you can do further processing inside of your FileInputFormat derivative's RecordReader implementation (just before it loads the value for a next() form of call -- which the MapRunner would use to read).
If you're looking to dig into Hadoop's source code to understand the flow yourself, MapTask.java is what you may be looking for (run* methods).
On Sun, Jun 12, 2011 at 3:25 AM, Mark question <[EMAIL PROTECTED]> wrote: > Hi, > > 1) Where can I find the "main" class of hadoop? The one that calls the > InputFormat then the MapperRunner and ReducerRunner and others? > > This will help me understand what is in memory or still on disk , exact > flow of data between split and mappers . > > My problem is, assuming I have a TextInputFormat and would like to modify > the input in memory before being read by RecordReader... where shall I do > that? > > InputFormat was my first guess, but unfortunately, it only defines the > logical splits ... So, the only way I can think of is use the recordReader > to read all the records in split into another variable (with the format I > want) then process that variable by map functions. > > But is that efficient? So, to understand this,I hope someone can give an > answer to Q(1) > > Thank you, > Mark >
-- Harsh J
madhu phatak 2011-06-21, 10:50
Define Ur own custom Record Reader and its efficient .
On Sun, Jun 12, 2011 at 10:12 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> Mark, > > I may not have gotten your question exactly, but you can do further > processing inside of your FileInputFormat derivative's RecordReader > implementation (just before it loads the value for a next() form of > call -- which the MapRunner would use to read). > > If you're looking to dig into Hadoop's source code to understand the > flow yourself, MapTask.java is what you may be looking for (run* > methods). > > On Sun, Jun 12, 2011 at 3:25 AM, Mark question <[EMAIL PROTECTED]> > wrote: > > Hi, > > > > 1) Where can I find the "main" class of hadoop? The one that calls the > > InputFormat then the MapperRunner and ReducerRunner and others? > > > > This will help me understand what is in memory or still on disk , > exact > > flow of data between split and mappers . > > > > My problem is, assuming I have a TextInputFormat and would like to modify > > the input in memory before being read by RecordReader... where shall I do > > that? > > > > InputFormat was my first guess, but unfortunately, it only defines the > > logical splits ... So, the only way I can think of is use the > recordReader > > to read all the records in split into another variable (with the format I > > want) then process that variable by map functions. > > > > But is that efficient? So, to understand this,I hope someone can give > an > > answer to Q(1) > > > > Thank you, > > Mark > > > > > > -- > Harsh J >
|
|