Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Hadoop processing


+
Kartashov, Andy 2012-11-08, 14:35
+
Jay Vyas 2012-11-08, 14:49
Copy link to this message
-
Re: Hadoop processing
Hello Andy,

     Just to add to what Mr. Jay has said, MR framework does its best to
run the map task on a node where the input data is present. Sometimes,
however, all the nodes(based on the replication factor) hosting the data
block for a map task’s input split don't have any free slots. In that case,
the job scheduler will look for a free map slot on a node in the same rack
as one of the blocks. Very occasionally even this is not possible, so an
off-rack node is used

Regards,
    Mohammad Tariq

On Thu, Nov 8, 2012 at 8:19 PM, Jay Vyas <[EMAIL PROTECTED]> wrote:

> Hmm this is interesting.  I think that:
>
> 1) For the map phases, hadoop is smart enough to try to run mappers
> locally, but i think you could force these DNs to actively participate in a
> Mapper job by decreasing the size of input splits, which would allow for
> many more mappers, some of which would be forced to run on files which were
> not necessarily local - in this scenario, those DNs don't yet have any
> local files on them that would be used for the input.
>
> 2) For the reducer phases - since of course the reducers will be copying
> mapper outputs from all over the cluster, one would expect that your Data
> nodes would naturally take part in this portion of the task if the
> num.reducers parameter was specified.
>
>
> On Thu, Nov 8, 2012 at 9:35 AM, Kartashov, Andy <[EMAIL PROTECTED]>wrote:
>
>>  Hadoopers,
>>
>> “Hadoop ships the code to the data instead of sending the data to the
>> code.”
>>
>> Say you added two DNs/TTs to the cluster. They have no data at this
>> point, i.e. you have not ran the balancer.
>>
>> In view of the above quoted statement, will these two nodes not
>> participate in the MapReduce job until you balanced some data onto those
>> nodes? Please kindly elaborate.
>>
>>
>>
>> Rgds,
>>
>> AK47
>>  NOTICE: This e-mail message and any attachments are confidential,
>> subject to copyright and may be privileged. Any unauthorized use, copying
>> or disclosure is prohibited. If you are not the intended recipient, please
>> delete and contact the sender immediately. Please consider the environment
>> before printing this e-mail. AVIS : le présent courriel et toute pièce
>> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
>> et peuvent être couverts par le secret professionnel. Toute utilisation,
>> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
>> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
>> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent
>> courriel
>>
>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>
+
Michael Segel 2012-11-08, 15:03
+
Kartashov, Andy 2012-11-08, 15:57
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB