Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Mapper Getting the whole split and not just line by line


Copy link to this message
-
Re: Mapper Getting the whole split and not just line by line
thanks!
I'll try overriding the run method first.

On Wed, Oct 12, 2011 at 3:18 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Yaron,
>
> That would certainly seem to be the easy way out, with the only
> negative side being that you'd have to cache your values in memory.
>
> If you plug deeper down into the RecordReader levels (which provide
> the specific nextKV(…) methods), you can perhaps keep just a list of
> offsets of all successful line matches and re-read the whole split in
> the second run. This would cost you slightly higher I/O as you seek
> through once again, but the benefit would be lower memory consumption
> -- if that can be a concern here.
>
> [Or go the longer way, and use the Reducer phase!]
>
> On Wed, Oct 12, 2011 at 5:14 PM, Yaron Gonen <[EMAIL PROTECTED]>
> wrote:
> > Thanks for the fast reply!
> > I've dug in the code a little bit, and it seems to me that I can achieve
> my
> > goal by overloading Mapper.run method: just iterate over the whole split
> by
> > using context.nextKeyValue() and then call map only with the values I
> need.
> > Since I'm a novice Hadooper, am I thinking it the wrong way?
> >
> > thanks again,
> > yaron
> >
> > On Wed, Oct 12, 2011 at 12:44 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> >>
> >> Hello Yaron,
> >>
> >> Yes, this is possible to do.
> >>
> >> You need to plug in your own RecordReader implementation into the job,
> >> to control the emits and the action done before feeding key-value pair
> >> data into map(…).
> >>
> >> On Wed, Oct 12, 2011 at 2:42 PM, Yaron Gonen <[EMAIL PROTECTED]>
> >> wrote:
> >> > Hi,
> >> > The map method in the Mapper gets as a parameter a single line from
> the
> >> > split. Is there a way for Mappers to get the whole split as input?
> >> > I'd like to scan the whole split before I decide which key-value pairs
> >> > to
> >> > emit to the reducer.
> >> > Thanks
> >> > yaron
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB