Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # dev >> Question about


Copy link to this message
-
Re: Question about
MR does not read the files in the front-end (unless a partitioner such
as the TOP demands it). The actual block-level read is done via the
DFSClient class (its sub-classes DFSInputStream and DFSOutputStream -
the first one should be where your interest lies.)

All MR cares about is scheduling the data locally, so it just takes
the block locations (metadata) to conjure up split objects for the
scheduler and the task and sends it across.

On Thu, Sep 13, 2012 at 5:40 AM, Vivi Lang <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> Is there anyone who can tell me that when we lanuch a mapreduce task, for
> example, wordcount, after the JobClient obtained the block locations (the
> related hosts/datanodes are stored in the specified split), which
> function/class will be called for reading those blocks from the datanode?
>
> Thanks,
> Vivian

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB