|
|
-
Hadoop performance benchmarking with TestDFSIO
Sameer Farooqui 2011-09-28, 21:45
Hi everyone,
I'm looking for some recommendations for how to get our Hadoop cluster to do faster I/O.
Currently, our lab cluster is 8 worker nodes and 1 master node (with NameNode and JobTracker).
Each worker node has: - 48 GB RAM - 16 processors (Intel Xeon E5630 @ 2.53 GHz) - 1 Gb Ethernet connection Due to company policy, we have to keep the HDFS storage on a disk array. Our SAS connected array is capable of 6 Gb (768 MB) for each of the 8 hosts. So, theoretically, we should be able to get a max of 6 GB simultaneous reads across the 8 nodes if we benchmark it.
Our disk array is presenting each of the 8 nodes with a 21 TB LUN. The LUN is RAID-5 across 12 disks on the array. That LUN is partitioned on the server into 6 different devices like this:
>> df -h
Filesystem Size Used Avail Use% Mounted on /dev/sdg1 3.5T 445G 2.9T 14% /data2/d1 /dev/sdg2 3.5T 439G 2.9T 14% /data2/d2 /dev/sdg3 3.5T 436G 2.9T 13% /data2/d3 /dev/sdg4 3.5T 435G 2.9T 13% /data2/d4 /dev/sdg5 3.5T 434G 2.9T 13% /data2/d5 /dev/sdg6 3.5T 431G 2.9T 13% /data2/d6
The file system type is ext3.
So, when we run TestDFSIO, here are the results:
*++ Write ++* >> hadoop jar /usr/lib/hadoop/hadoop-test-0.20.2-CDH3B4.jar TestDFSIO -write -nrFiles 80 -fileSize 10000
11/09/27 18:54:53 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write 11/09/27 18:54:53 INFO fs.TestDFSIO: Date & time: Tue Sep 27 18:54:53 EDT 2011 11/09/27 18:54:53 INFO fs.TestDFSIO: Number of files: 80 11/09/27 18:54:53 INFO fs.TestDFSIO: Total MBytes processed: 800000 11/09/27 18:54:53 INFO fs.TestDFSIO: Throughput mb/sec: 8.2742240008678 11/09/27 18:54:53 INFO fs.TestDFSIO: Average IO rate mb/sec: 8.288116455078125 11/09/27 18:54:53 INFO fs.TestDFSIO: IO rate std deviation: 0.3435565217052116 11/09/27 18:54:53 INFO fs.TestDFSIO: Test exec time sec: 1427.856
So, throughput across all 8 nodes is 8.27 * 80 = 661 MB per second. *++ Read ++* >> hadoop jar /usr/lib/hadoop/hadoop-test-0.20.2-CDH3B4.jar TestDFSIO -read -nrFiles 80 -fileSize 10000
11/09/27 19:43:12 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read 11/09/27 19:43:12 INFO fs.TestDFSIO: Date & time: Tue Sep 27 19:43:12 EDT 2011 11/09/27 19:43:12 INFO fs.TestDFSIO: Number of files: 80 11/09/27 19:43:12 INFO fs.TestDFSIO: Total MBytes processed: 800000 11/09/27 19:43:12 INFO fs.TestDFSIO: Throughput mb/sec: 5.854318503905489 11/09/27 19:43:12 INFO fs.TestDFSIO: Average IO rate mb/sec: 5.96372652053833 11/09/27 19:43:12 INFO fs.TestDFSIO: IO rate std deviation: 0.9885505979030621 11/09/27 19:43:12 INFO fs.TestDFSIO: Test exec time sec: 2055.465 So, throughput across all 8 nodes is 5.85 * 80 = 468 MB per second. *Question 1:* Why are the reads and writes so much slower than expected? Any suggestions about what can be changed? I understand that RAID-5 backed disks are an unorthodox configuration for HDFS, but has anybody successfully done this? If so, what kind of results did you see?
Also, we detached the 8 nodes from the disk array and connected each of them to 6 local hard drives for testing (w/ ext4 file system). Then we ran the same read TestDFSIO and saw this:
11/09/26 20:24:09 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read 11/09/26 20:24:09 INFO fs.TestDFSIO: Date & time: Mon Sep 26 20:24:09 EDT 2011 11/09/26 20:24:09 INFO fs.TestDFSIO: Number of files: 80 11/09/26 20:24:09 INFO fs.TestDFSIO: Total MBytes processed: 800000 11/09/26 20:24:09 INFO fs.TestDFSIO: Throughput mb/sec: 13.065623285187982 11/09/26 20:24:09 INFO fs.TestDFSIO: Average IO rate mb/sec: 15.160531997680664 11/09/26 20:24:09 INFO fs.TestDFSIO: IO rate std deviation: 8.000530562022949 11/09/26 20:24:09 INFO fs.TestDFSIO: Test exec time sec: 1123.447 So, with local disks, reads are about 1 GB per second across the 8 nodes. Much faster! With 6 local disks, writes performed the same though:
11/09/26 19:49:58 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write 11/09/26 19:49:58 INFO fs.TestDFSIO: Date & time: Mon Sep 26 19:49:58 EDT 2011 11/09/26 19:49:58 INFO fs.TestDFSIO: Number of files: 80 11/09/26 19:49:58 INFO fs.TestDFSIO: Total MBytes processed: 800000 11/09/26 19:49:58 INFO fs.TestDFSIO: Throughput mb/sec: 8.573949802610528 11/09/26 19:49:58 INFO fs.TestDFSIO: Average IO rate mb/sec: 8.588902473449707 11/09/26 19:49:58 INFO fs.TestDFSIO: IO rate std deviation: 0.3639466752546032 11/09/26 19:49:58 INFO fs.TestDFSIO: Test exec time sec: 1383.734 Write throughput across the cluster was 685 MB per second. By the way, our HDFS file system is healthy:
Status: HEALTHY Total size: 9018951544337 B Total dirs: 24230 Total files: 1032578 Total blocks (validated): 1139580 (avg. block size 7914276 B) Minimally replicated blocks: 1139580 (100.0 %) Over-replicated blocks: 1 (8.775163E-5 %) Under-replicated blocks: 16 (0.001404026 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 2 Average block replication: 2.0122387 Corrupt blocks: 0 Missing replicas: 32 (0.0013954865 %) Number of data-nodes: 8 Number of racks: 1 FSCK ended at Tue Sep 27 18:57:23 EDT 2011 in 7453 milliseconds The filesystem under path '/' is HEALTHY - Sameer
-
Re: Hadoop performance benchmarking with TestDFSIO
Steve Loughran 2011-09-29, 09:58
On 28/09/11 22:45, Sameer Farooqui wrote: > Hi everyone, > > I'm looking for some recommendations for how to get our Hadoop cluster to do > faster I/O. > > Currently, our lab cluster is 8 worker nodes and 1 master node (with > NameNode and JobTracker). > > Each worker node has: > - 48 GB RAM > - 16 processors (Intel Xeon E5630 @ 2.53 GHz) > - 1 Gb Ethernet connection > > > Due to company policy, we have to keep the HDFS storage on a disk array. Our > SAS connected array is capable of 6 Gb (768 MB) for each of the 8 hosts. So, > theoretically, we should be able to get a max of 6 GB simultaneous reads > across the 8 nodes if we benchmark it.
missing the point on Hadoop there; you will end up getting the bandwidth of the HDD most likely to fail next, copy replication is overkill and you will reach limits on scale both technical (SAN scalability) and financial.
> > Our disk array is presenting each of the 8 nodes with a 21 TB LUN. The LUN > is RAID-5 across 12 disks on the array. That LUN is partitioned on the > server into 6 different devices like this: >
> The file system type is ext3.
set noatime
> > So, when we run TestDFSIO, here are the results: > > *++ Write ++* >>> hadoop jar /usr/lib/hadoop/hadoop-test-0.20.2-CDH3B4.jar TestDFSIO -write > -nrFiles 80 -fileSize 10000 > > 11/09/27 18:54:53 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write > 11/09/27 18:54:53 INFO fs.TestDFSIO: Date& time: Tue Sep 27 > 18:54:53 EDT 2011 > 11/09/27 18:54:53 INFO fs.TestDFSIO: Number of files: 80 > 11/09/27 18:54:53 INFO fs.TestDFSIO: Total MBytes processed: 800000 > 11/09/27 18:54:53 INFO fs.TestDFSIO: Throughput mb/sec: 8.2742240008678 > 11/09/27 18:54:53 INFO fs.TestDFSIO: Average IO rate mb/sec: > 8.288116455078125 > 11/09/27 18:54:53 INFO fs.TestDFSIO: IO rate std deviation: > 0.3435565217052116 > 11/09/27 18:54:53 INFO fs.TestDFSIO: Test exec time sec: 1427.856 > > So, throughput across all 8 nodes is 8.27 * 80 = 661 MB per second. > > > *++ Read ++* >>> hadoop jar /usr/lib/hadoop/hadoop-test-0.20.2-CDH3B4.jar TestDFSIO -read > -nrFiles 80 -fileSize 10000 > > 11/09/27 19:43:12 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read > 11/09/27 19:43:12 INFO fs.TestDFSIO: Date& time: Tue Sep 27 > 19:43:12 EDT 2011 > 11/09/27 19:43:12 INFO fs.TestDFSIO: Number of files: 80 > 11/09/27 19:43:12 INFO fs.TestDFSIO: Total MBytes processed: 800000 > 11/09/27 19:43:12 INFO fs.TestDFSIO: Throughput mb/sec: > 5.854318503905489 > 11/09/27 19:43:12 INFO fs.TestDFSIO: Average IO rate mb/sec: > 5.96372652053833 > 11/09/27 19:43:12 INFO fs.TestDFSIO: IO rate std deviation: > 0.9885505979030621 > 11/09/27 19:43:12 INFO fs.TestDFSIO: Test exec time sec: 2055.465 > > > So, throughput across all 8 nodes is 5.85 * 80 = 468 MB per second. > > > *Question 1:* Why are the reads and writes so much slower than expected? Any > suggestions about what can be changed? I understand that RAID-5 backed disks > are an unorthodox configuration for HDFS, but has anybody successfully done > this? If so, what kind of results did you see? > > > Also, we detached the 8 nodes from the disk array and connected each of them > to 6 local hard drives for testing (w/ ext4 file system). Then we ran the > same read TestDFSIO and saw this: > > 11/09/26 20:24:09 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read > 11/09/26 20:24:09 INFO fs.TestDFSIO: Date& time: Mon Sep 26 > 20:24:09 EDT 2011 > 11/09/26 20:24:09 INFO fs.TestDFSIO: Number of files: 80 > 11/09/26 20:24:09 INFO fs.TestDFSIO: Total MBytes processed: 800000 > 11/09/26 20:24:09 INFO fs.TestDFSIO: Throughput mb/sec: > 13.065623285187982 > 11/09/26 20:24:09 INFO fs.TestDFSIO: Average IO rate mb/sec: > 15.160531997680664 > 11/09/26 20:24:09 INFO fs.TestDFSIO: IO rate std deviation: > 8.000530562022949 > 11/09/26 20:24:09 INFO fs.TestDFSIO: Test exec time sec: 1123.447 > > > So, with local disks, reads are about 1 GB per second across the 8 nodes.
Much lower cost per TB too. Orders of magnitude lower. Writes get streamed to multiple HDFS nodes for redundancy; you've got the bandwidth + network overhead and 3x the data. Options -stop using HDFS on the SAN, it's the wrong approach. Mount the SAN directly and use file:// URLs, let the SAN do the networking and redundancy. -buy some local HDDs at least for all the temp data: logs, overspill mapreduce.tmp.dir. You don't need redundancy here
|
|