MR does not read the files in the front-end (unless a partitioner such
as the TOP demands it). The actual block-level read is done via the
DFSClient class (its sub-classes DFSInputStream and DFSOutputStream -
the first one should be where your interest lies.)
All MR cares about is scheduling the data locally, so it just takes
the block locations (metadata) to conjure up split objects for the
scheduler and the task and sends it across.
On Thu, Sep 13, 2012 at 5:40 AM, Vivi Lang <[EMAIL PROTECTED]> wrote:
> Hi all,
> Is there anyone who can tell me that when we lanuch a mapreduce task, for
> example, wordcount, after the JobClient obtained the block locations (the
> related hosts/datanodes are stored in the specified split), which
> function/class will be called for reading those blocks from the datanode?