|
|
Vivi Lang 2012-09-13, 00:10
Hi all,
Is there anyone who can tell me that when we lanuch a mapreduce task, for example, wordcount, after the JobClient obtained the block locations (the related hosts/datanodes are stored in the specified split), which function/class will be called for reading those blocks from the datanode?
Thanks, Vivian
Charles Baker 2012-09-13, 00:56
Hi Vivian. Take a look at TextInputFormat and the RecordReader classes. This is set via JobConf.setInputFormat().
-Chuck
-----Original Message----- From: Vivi Lang [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 12, 2012 5:10 PM To: [EMAIL PROTECTED] Subject: Question about
Hi all,
Is there anyone who can tell me that when we lanuch a mapreduce task, for example, wordcount, after the JobClient obtained the block locations (the related hosts/datanodes are stored in the specified split), which function/class will be called for reading those blocks from the datanode?
Thanks, Vivian SDL Enterprise Technologies, Inc. - all rights reserved. The information contained in this email may be confidential and/or legally privileged. It has been sent for the sole use of the intended recipient(s). If you are not the intended recipient of this mail, you are hereby notified that any unauthorized review, use, disclosure, dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please reply to the sender and destroy all copies of the message. Registered address: 69 Hickory Drive, 3rd Floor, Waltham, MA 02451, USA
Harsh J 2012-09-13, 16:31
MR does not read the files in the front-end (unless a partitioner such as the TOP demands it). The actual block-level read is done via the DFSClient class (its sub-classes DFSInputStream and DFSOutputStream - the first one should be where your interest lies.)
All MR cares about is scheduling the data locally, so it just takes the block locations (metadata) to conjure up split objects for the scheduler and the task and sends it across.
On Thu, Sep 13, 2012 at 5:40 AM, Vivi Lang <[EMAIL PROTECTED]> wrote: > Hi all, > > Is there anyone who can tell me that when we lanuch a mapreduce task, for > example, wordcount, after the JobClient obtained the block locations (the > related hosts/datanodes are stored in the specified split), which > function/class will be called for reading those blocks from the datanode? > > Thanks, > Vivian
-- Harsh J
|
|