sam liu 2014-01-02, 09:42
-Re: What are the methods to share dynamic data among mappers/reducers?
Vinod Kumar Vavilapalli 2014-01-02, 18:21
There isn't anything natively supported for that in the framework, but you can do that yourselves by using a shared service (for e.g via HDFS files, ZooKeeper nodes) that mappers/reducers all have access to.
More details on your usecase? In any case, once you start making mappers and reducers depend on either externally changing state or inter-dependence, you may be breaking fundamental assumptions of MapReduce - embarrassingly parallel computation (limiting scalability) and/or idempotency (affecting retries during failures).
On Jan 2, 2014, at 1:42 AM, sam liu <[EMAIL PROTECTED]> wrote:
> As I know, the Distributed Cache will copy the shared data to the slaves before starting job, and won't change the shared data after that.
> So are there any solutions to share dynamic data among mappers/reducers?
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.