Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Separate communications of HDFS and MapReduce


Copy link to this message
-
Re: Separate communications of HDFS and MapReduce

On Apr 26, 2010, at 6:23 AM, Druilhe Remi wrote:
> For example, when I run "wordcount" example, there is HDFS communications and MapReduce communications and I am not able to distinguish which packet belong to HDFS or to MapReduce.

This shouldn't be too surprising given that the MapReduce job needs to talk to HDFS to determine input and to write output.

> A way could be to use odd port number for HDFS and even port number for MapReduce, but I think I have to modify source code.

The ports for the services are already separated out.  

In general, client -> server connections map out as:

MR -> MR, HDFS
HDFS -> HDFS

Given a small 3 node grid, a dump of what processes open what ports, and what connections are made between all the machines, it should be trivial to make a more complex connection map.  [You can probably even do it as a map reduce job. :) ]
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB