Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> isSplitable() problem


Copy link to this message
-
Re: isSplitable() problem
The current code guarantees that they will be received in order.  There some patches that are likely to go in soon that would allow for the JVM itself to be reused.  In those cases I believe that the mapper class would be recreated, so the only thing you would have to worry about would be static values that are updated while processing the data.

-- Bobby Evans

On 4/24/12 4:45 AM, "Dan Drew" <[EMAIL PROTECTED]> wrote:

I have chosen to use Jay's suggestion as a quick workaround and am pleased
to report that it seems to work well on small test inputs.

My question now is, are the mappers guaranteed to receive the file's lines
in order?

Browsing the source suggests this is so, but I just want to make sure as my
understanding of Hadoop is transubstantial.

Thank you for your patience in answering my questions.

On 23 April 2012 14:28, Harsh J <[EMAIL PROTECTED]> wrote:

> Jay,
>
> On Mon, Apr 23, 2012 at 6:43 PM, JAX <[EMAIL PROTECTED]> wrote:
> > Curious : Seems like you could aggregate the results in the mapper as a
> local variable or list of strings--- is there a way to know that your
> mapper has just read the LAST line of an input split?
>
> True. Can be one way to do it (unless aggregation of 'records' needs
> to happen live, and you don't wish to store it all in memory).
>
> > Is there a "cleanup" or "finalize" method in mappers that is run at the
> end of a whole steam read to support these sort of chunked, in memor map/r
> operations?
>
> Yes there is. See:
>
> Old API:
> http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/Mapper.html
> (See Closeable's close())
>
> New API:
> http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapreduce/Mapper.html#cleanup(org.apache.hadoop.mapreduce.Mapper.Context)
>
>
> --
> Harsh J
>