Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: Retrieve and compute input splits


Copy link to this message
-
Re: Retrieve and compute input splits
Hi
I have attached the anatomy of MR from definitive guide.

In step 6 it says JT/Scheduler  retrieve  input splits computed by the client from hdfs.

In the above line it refers to as the client computes input splits.
1. Why does the JT/Scheduler retrieve the input splits and what does it do.
If it is retrieving the input split does this mean it goes to the block and reads each record 
and gets the record back to JT. If so this is a lot of data movement for large files.
which is not data locality. so i m getting confused.

2. How does the client know how to calculate the input splits.

Any help please.
Thanks
Sai
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB