-Re: Feature Proposal: We already have something coded for our research purposes and would like to contribute.
Nitin Pawar 2013-07-19, 09:26
This looks interesting.
Do you have any blog or wikipage where I can read about your approach.
This will surely be useful on planning network capacities at least for me.
On Fri, Jul 19, 2013 at 2:45 PM, Lev <[EMAIL PROTECTED]> wrote:
> My colleague and I have implemented a logging system that collects reports
> about Hadoop network traffic in a centralized "Statistic Server". We
> collect information about Mapper Inputs, Reducer Inputs and HDFS Writes at
> the transfer level, rather than the total number of bytes per task (which
> is what counters do currently). We originally aimed this at building a
> system which would be able to keep track of network performance in the
> cluster in real-time so that scheduling adjustments can be made on the fly
> (hence a centralized "Statistic Server" was created, but the system can
> also be easily used to log them locally on each machine by adjusting the
> XML configuration files). We eventually used this system for investigating
> the effects of network speed on job running time, particularly in the
> context of clusters deployed across the Internet.
> We would like to gauge interest in the Hadoop community in this feature, as
> we would like to contribute this to the project. It is, mostly, aimed at
> research users (those who use Hadoop as a research platform, and also those
> who research the workings and performance of Hadoop itself - We are of the
> second category ourselves), although it might also be used by people who
> wish to analyze the data flow of the various stages of Hadoop computation
> in their jobs. In turn, this should enable a new way to discover possible
> optimizations for jobs.
> This has no effect on Hadoop when disabled, which, by default, it will be.
> Please let us know what/if we should elaborate further, if any interest
> Lev Faerman and Aviad Pines.