Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Hadoop Runner


Copy link to this message
-
Re: Hadoop Runner
Define Ur own custom Record Reader and its efficient .

On Sun, Jun 12, 2011 at 10:12 AM, Harsh J <[EMAIL PROTECTED]> wrote:

> Mark,
>
> I may not have gotten your question exactly, but you can do further
> processing inside of your FileInputFormat derivative's RecordReader
> implementation (just before it loads the value for a next() form of
> call -- which the MapRunner would use to read).
>
> If you're looking to dig into Hadoop's source code to understand the
> flow yourself, MapTask.java is what you may be looking for (run*
> methods).
>
> On Sun, Jun 12, 2011 at 3:25 AM, Mark question <[EMAIL PROTECTED]>
> wrote:
> > Hi,
> >
> >  1) Where can I find the "main" class of hadoop? The one that calls the
> > InputFormat then the MapperRunner and ReducerRunner and others?
> >
> >    This will help me understand what is in memory or still on disk ,
> exact
> > flow of data between split and mappers .
> >
> > My problem is, assuming I have a TextInputFormat and would like to modify
> > the input in memory before being read by RecordReader... where shall I do
> > that?
> >
> >    InputFormat was my first guess, but unfortunately, it only defines the
> > logical splits ... So, the only way I can think of is use the
> recordReader
> > to read all the records in split into another variable (with the format I
> > want) then process that variable by map functions.
> >
> >   But is that efficient? So, to understand this,I hope someone can give
> an
> > answer to Q(1)
> >
> > Thank you,
> > Mark
> >
>
>
>
> --
> Harsh J
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB