Define Ur own custom Record Reader and its efficient .
On Sun, Jun 12, 2011 at 10:12 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> I may not have gotten your question exactly, but you can do further
> processing inside of your FileInputFormat derivative's RecordReader
> implementation (just before it loads the value for a next() form of
> call -- which the MapRunner would use to read).
> If you're looking to dig into Hadoop's source code to understand the
> flow yourself, MapTask.java is what you may be looking for (run*
> On Sun, Jun 12, 2011 at 3:25 AM, Mark question <[EMAIL PROTECTED]>
> > Hi,
> > 1) Where can I find the "main" class of hadoop? The one that calls the
> > InputFormat then the MapperRunner and ReducerRunner and others?
> > This will help me understand what is in memory or still on disk ,
> > flow of data between split and mappers .
> > My problem is, assuming I have a TextInputFormat and would like to modify
> > the input in memory before being read by RecordReader... where shall I do
> > that?
> > InputFormat was my first guess, but unfortunately, it only defines the
> > logical splits ... So, the only way I can think of is use the
> > to read all the records in split into another variable (with the format I
> > want) then process that variable by map functions.
> > But is that efficient? So, to understand this,I hope someone can give
> > answer to Q(1)
> > Thank you,
> > Mark
> Harsh J