I'll try overriding the run method first.
On Wed, Oct 12, 2011 at 3:18 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> That would certainly seem to be the easy way out, with the only
> negative side being that you'd have to cache your values in memory.
> If you plug deeper down into the RecordReader levels (which provide
> the specific nextKV(…) methods), you can perhaps keep just a list of
> offsets of all successful line matches and re-read the whole split in
> the second run. This would cost you slightly higher I/O as you seek
> through once again, but the benefit would be lower memory consumption
> -- if that can be a concern here.
> [Or go the longer way, and use the Reducer phase!]
> On Wed, Oct 12, 2011 at 5:14 PM, Yaron Gonen <[EMAIL PROTECTED]>
> > Thanks for the fast reply!
> > I've dug in the code a little bit, and it seems to me that I can achieve
> > goal by overloading Mapper.run method: just iterate over the whole split
> > using context.nextKeyValue() and then call map only with the values I
> > Since I'm a novice Hadooper, am I thinking it the wrong way?
> > thanks again,
> > yaron
> > On Wed, Oct 12, 2011 at 12:44 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> >> Hello Yaron,
> >> Yes, this is possible to do.
> >> You need to plug in your own RecordReader implementation into the job,
> >> to control the emits and the action done before feeding key-value pair
> >> data into map(…).
> >> On Wed, Oct 12, 2011 at 2:42 PM, Yaron Gonen <[EMAIL PROTECTED]>
> >> wrote:
> >> > Hi,
> >> > The map method in the Mapper gets as a parameter a single line from
> >> > split. Is there a way for Mappers to get the whole split as input?
> >> > I'd like to scan the whole split before I decide which key-value pairs
> >> > to
> >> > emit to the reducer.
> >> > Thanks
> >> > yaron
> >> >
> >> --
> >> Harsh J
> Harsh J