Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> Feature Proposal: We already have something coded for our research purposes and would like to contribute.


Copy link to this message
-
Re: Feature Proposal: We already have something coded for our research purposes and would like to contribute.
This looks interesting.
Do you have any blog or wikipage where I can read about your approach.

This will surely be useful on planning network capacities at least for me.
On Fri, Jul 19, 2013 at 2:45 PM, Lev <[EMAIL PROTECTED]> wrote:

> Hi!
>
> My colleague and I have implemented a logging system that collects reports
> about Hadoop network traffic in a centralized "Statistic Server". We
> collect information about Mapper Inputs, Reducer Inputs and HDFS Writes at
> the transfer level, rather than the total number of bytes per task (which
> is what counters do currently). We originally aimed this at building a
> system which would be able to keep track of network performance in the
> cluster in real-time so that scheduling adjustments can be made on the fly
> (hence a centralized "Statistic Server" was created, but the system can
> also be easily used to log them locally on each machine by adjusting the
> XML configuration files). We eventually used this system for investigating
> the effects of network speed on job running time, particularly in the
> context of clusters deployed across the Internet.
>
> We would like to gauge interest in the Hadoop community in this feature, as
> we would like to contribute this to the project. It is, mostly, aimed at
> research users (those who use Hadoop as a research platform, and also those
> who research the workings and performance of Hadoop itself - We are of the
> second category ourselves), although it might also be used by people who
> wish to analyze the data flow of the various stages of Hadoop computation
> in their jobs. In turn, this should enable a new way to discover possible
> optimizations for jobs.
>
> This has no effect on Hadoop when disabled, which, by default, it will be.
>
> Please let us know what/if we should elaborate further, if any interest
> exists.
>
> Thanks,
> Lev Faerman and Aviad Pines.
>

--
Nitin Pawar
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB