Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Hadoop performance benchmarking with TestDFSIO


Copy link to this message
-
Re: Hadoop performance benchmarking with TestDFSIO
Steve Loughran 2011-09-29, 09:58
On 28/09/11 22:45, Sameer Farooqui wrote:
> Hi everyone,
>
> I'm looking for some recommendations for how to get our Hadoop cluster to do
> faster I/O.
>
> Currently, our lab cluster is 8 worker nodes and 1 master node (with
> NameNode and JobTracker).
>
> Each worker node has:
> - 48 GB RAM
> - 16 processors (Intel Xeon E5630 @ 2.53 GHz)
> - 1 Gb Ethernet connection
>
>
> Due to company policy, we have to keep the HDFS storage on a disk array. Our
> SAS connected array is capable of 6 Gb (768 MB) for each of the 8 hosts. So,
> theoretically, we should be able to get a max of 6 GB simultaneous reads
> across the 8 nodes if we benchmark it.

missing the point on Hadoop there; you will end up getting the bandwidth
of the HDD most likely to fail next, copy replication is overkill and
you will reach limits on scale both technical (SAN scalability) and
financial.

>
> Our disk array is presenting each of the 8 nodes with a 21 TB LUN. The LUN
> is RAID-5 across 12 disks on the array. That LUN is partitioned on the
> server into 6 different devices like this:
>

> The file system type is ext3.

set noatime

>
> So, when we run TestDFSIO, here are the results:
>
> *++ Write ++*
>>> hadoop jar /usr/lib/hadoop/hadoop-test-0.20.2-CDH3B4.jar TestDFSIO -write
> -nrFiles 80 -fileSize 10000
>
> 11/09/27 18:54:53 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
> 11/09/27 18:54:53 INFO fs.TestDFSIO:            Date&  time: Tue Sep 27
> 18:54:53 EDT 2011
> 11/09/27 18:54:53 INFO fs.TestDFSIO:        Number of files: 80
> 11/09/27 18:54:53 INFO fs.TestDFSIO: Total MBytes processed: 800000
> 11/09/27 18:54:53 INFO fs.TestDFSIO:      Throughput mb/sec: 8.2742240008678
> 11/09/27 18:54:53 INFO fs.TestDFSIO: Average IO rate mb/sec:
> 8.288116455078125
> 11/09/27 18:54:53 INFO fs.TestDFSIO:  IO rate std deviation:
> 0.3435565217052116
> 11/09/27 18:54:53 INFO fs.TestDFSIO:     Test exec time sec: 1427.856
>
> So, throughput across all 8 nodes is 8.27 * 80 = 661 MB per second.
>
>
> *++ Read ++*
>>> hadoop jar /usr/lib/hadoop/hadoop-test-0.20.2-CDH3B4.jar TestDFSIO -read
> -nrFiles 80 -fileSize 10000
>
> 11/09/27 19:43:12 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
> 11/09/27 19:43:12 INFO fs.TestDFSIO:            Date&  time: Tue Sep 27
> 19:43:12 EDT 2011
> 11/09/27 19:43:12 INFO fs.TestDFSIO:        Number of files: 80
> 11/09/27 19:43:12 INFO fs.TestDFSIO: Total MBytes processed: 800000
> 11/09/27 19:43:12 INFO fs.TestDFSIO:      Throughput mb/sec:
> 5.854318503905489
> 11/09/27 19:43:12 INFO fs.TestDFSIO: Average IO rate mb/sec:
> 5.96372652053833
> 11/09/27 19:43:12 INFO fs.TestDFSIO:  IO rate std deviation:
> 0.9885505979030621
> 11/09/27 19:43:12 INFO fs.TestDFSIO:     Test exec time sec: 2055.465
>
>
> So, throughput across all 8 nodes is 5.85 * 80 = 468 MB per second.
>
>
> *Question 1:* Why are the reads and writes so much slower than expected? Any
> suggestions about what can be changed? I understand that RAID-5 backed disks
> are an unorthodox configuration for HDFS, but has anybody successfully done
> this? If so, what kind of results did you see?
>
>
> Also, we detached the 8 nodes from the disk array and connected each of them
> to 6 local hard drives for testing (w/ ext4 file system). Then we ran the
> same read TestDFSIO and saw this:
>
> 11/09/26 20:24:09 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
> 11/09/26 20:24:09 INFO fs.TestDFSIO:            Date&  time: Mon Sep 26
> 20:24:09 EDT 2011
> 11/09/26 20:24:09 INFO fs.TestDFSIO:        Number of files: 80
> 11/09/26 20:24:09 INFO fs.TestDFSIO: Total MBytes processed: 800000
> 11/09/26 20:24:09 INFO fs.TestDFSIO:      Throughput mb/sec:
> 13.065623285187982
> 11/09/26 20:24:09 INFO fs.TestDFSIO: Average IO rate mb/sec:
> 15.160531997680664
> 11/09/26 20:24:09 INFO fs.TestDFSIO:  IO rate std deviation:
> 8.000530562022949
> 11/09/26 20:24:09 INFO fs.TestDFSIO:     Test exec time sec: 1123.447
>
>
> So, with local disks, reads are about 1 GB per second across the 8 nodes.

Much lower cost per TB too. Orders of magnitude lower.
Writes get streamed to multiple HDFS nodes for redundancy; you've got
the bandwidth + network overhead and 3x the data.
Options
  -stop using HDFS on the SAN, it's the wrong approach. Mount the SAN
directly and use file:// URLs, let the SAN do the networking and
redundancy.
  -buy some local HDDs at least for all the temp data: logs, overspill
mapreduce.tmp.dir. You don't need redundancy here