-Re: Retrieve and compute input splits
Peyman Mohajerian 2013-09-27, 23:02
For the JobClient to compute the input splits doesn't it need to contact
Name Node. Only Name Node knows where the splits are, how can it compute it
without that additional call?
On Fri, Sep 27, 2013 at 1:41 AM, Sonal Goyal <[EMAIL PROTECTED]> wrote:
> The input splits are not copied, only the information on the location of
> the splits is copied to the jobtracker so that it can assign tasktrackers
> which are local to the split.
> Check the Job Initialization section at
> To create the list of tasks to run, the job scheduler first retrieves the
> input splits computed by the JobClient from the shared filesystem (step
> 6). It then creates one map task for each split. The number of reduce tasks
> to create is determined by the mapred.reduce.tasks property in the JobConf,
> which is set by the setNumReduceTasks() method, and the scheduler simply
> creates this number of reduce tasks to be run. Tasks are given IDs at this
> Best Regards,
> Nube Technologies <http://www.nubetech.co>
> On Fri, Sep 27, 2013 at 10:55 AM, Sai Sai <[EMAIL PROTECTED]> wrote:
>> I have attached the anatomy of MR from definitive guide.
>> In step 6 it says JT/Scheduler retrieve input splits computed by the
>> client from hdfs.
>> In the above line it refers to as the client computes input splits.
>> 1. Why does the JT/Scheduler retrieve the input splits and what does it
>> If it is retrieving the input split does this mean it goes to the block
>> and reads each record
>> and gets the record back to JT. If so this is a lot of data movement for
>> large files.
>> which is not data locality. so i m getting confused.
>> 2. How does the client know how to calculate the input splits.
>> Any help please.