Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - TestDFSIO info required


+
Gaurav Dasgupta 2012-08-30, 07:14
Copy link to this message
-
Re: TestDFSIO info required
Gaurav Dasgupta 2012-08-30, 10:56
Hi All,

The formula is actually: *Throughput = (size*1000) / (time*MEGA)*
*                                                    = (1073741824000*1000)
/ (184793950 * 1048576)*
*                                                    = 5.54130695296031*

And the "time" is the summation of all the "Exec Time" of each "Task
Attempts" of the Map phase. These can be found inside the "Task Logs" of
each Task Attempts.
So, solved.

Thanks,
Gaurav Dasgupta

On Thu, Aug 30, 2012 at 12:44 PM, Gaurav Dasgupta <[EMAIL PROTECTED]>wrote:

> Hi,
>
> I ran TestDFSIO in my Hadoop cluster:
> *hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar TestDFSIO -write
> -nrFiles 100 -fileSize 10240*
> The report generated is:
> *12/08/30 01:31:34 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:            Date & time: Thu Aug 30
> 01:31:34 CDT 2012*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:        Number of files: 100*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO: Total MBytes processed: 1024000.0*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:      Throughput mb/sec:
> 5.54130695296031*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO: Average IO rate mb/sec:
> 5.875064849853516*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:  IO rate std deviation:
> 1.503623716482166*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:     Test exec time sec: 3490.168*
>
> **
>
> I was refering to the blog:
>
>
> http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/
>
>
>
> As per my understanding from that blog, I calculated *Throughput > (1024000*1000)/3490.168 =  293395.61* which is not my throughput ofcourse.
>
> Then I found a file in the HDFS output directory of the job:
>
> *hadoop fs -cat /benchmarks/TestDFSIO/io_write/part-00000* gave me this:
>
>
>
> *f:rate 587506.5
> f:sqrate 3677727.2
> l:size 1073741824000
> l:tasks 100
> l:time 184793950*
>
> Then I applied this above time in the formula: *Throughput > (1024000*1000)/184793950 = 5.541* which is my throughput.
>
>
>
> Can someone tell me what exactly is this time in the HDFS output
> directory file "part-0000" ?
>
>
>
> Thanks,
>
> Gaurav Dasgupta
>