Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Hadoop Runner


Copy link to this message
-
Re: Hadoop Runner
Define Ur own custom Record Reader and its efficient .

On Sun, Jun 12, 2011 at 10:12 AM, Harsh J <[EMAIL PROTECTED]> wrote:

> Mark,
>
> I may not have gotten your question exactly, but you can do further
> processing inside of your FileInputFormat derivative's RecordReader
> implementation (just before it loads the value for a next() form of
> call -- which the MapRunner would use to read).
>
> If you're looking to dig into Hadoop's source code to understand the
> flow yourself, MapTask.java is what you may be looking for (run*
> methods).
>
> On Sun, Jun 12, 2011 at 3:25 AM, Mark question <[EMAIL PROTECTED]>
> wrote:
> > Hi,
> >
> >  1) Where can I find the "main" class of hadoop? The one that calls the
> > InputFormat then the MapperRunner and ReducerRunner and others?
> >
> >    This will help me understand what is in memory or still on disk ,
> exact
> > flow of data between split and mappers .
> >
> > My problem is, assuming I have a TextInputFormat and would like to modify
> > the input in memory before being read by RecordReader... where shall I do
> > that?
> >
> >    InputFormat was my first guess, but unfortunately, it only defines the
> > logical splits ... So, the only way I can think of is use the
> recordReader
> > to read all the records in split into another variable (with the format I
> > want) then process that variable by map functions.
> >
> >   But is that efficient? So, to understand this,I hope someone can give
> an
> > answer to Q(1)
> >
> > Thank you,
> > Mark
> >
>
>
>
> --
> Harsh J
>