Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - TestDFSIO info required


Copy link to this message
-
TestDFSIO info required
Gaurav Dasgupta 2012-08-30, 07:14
Hi,

I ran TestDFSIO in my Hadoop cluster:
*hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar TestDFSIO -write -nrFiles
100 -fileSize 10240*
The report generated is:
*12/08/30 01:31:34 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write*

*12/08/30 01:31:34 INFO fs.TestDFSIO:            Date & time: Thu Aug 30
01:31:34 CDT 2012*

*12/08/30 01:31:34 INFO fs.TestDFSIO:        Number of files: 100*

*12/08/30 01:31:34 INFO fs.TestDFSIO: Total MBytes processed: 1024000.0*

*12/08/30 01:31:34 INFO fs.TestDFSIO:      Throughput mb/sec:
5.54130695296031*

*12/08/30 01:31:34 INFO fs.TestDFSIO: Average IO rate mb/sec:
5.875064849853516*

*12/08/30 01:31:34 INFO fs.TestDFSIO:  IO rate std deviation:
1.503623716482166*

*12/08/30 01:31:34 INFO fs.TestDFSIO:     Test exec time sec: 3490.168*

**

I was refering to the blog:

http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/

As per my understanding from that blog, I calculated *Throughput (1024000*1000)/3490.168 =  293395.61* which is not my throughput ofcourse.

Then I found a file in the HDFS output directory of the job:

*hadoop fs -cat /benchmarks/TestDFSIO/io_write/part-00000* gave me this:

*f:rate 587506.5
f:sqrate 3677727.2
l:size 1073741824000
l:tasks 100
l:time 184793950*

Then I applied this above time in the formula: *Throughput (1024000*1000)/184793950 = 5.541* which is my throughput.

Can someone tell me what exactly is this time in the HDFS output
directory file "part-0000" ?

Thanks,

Gaurav Dasgupta
+
Gaurav Dasgupta 2012-08-30, 10:56