Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> storage performance problem


Copy link to this message
-
storage performance problem
I'm seeing an odd storage performance problem that I hope can be fixed with the right configuration parameter, but nothing I've tried so far helps.  These tests were done in a virtual machine running on ESX, but earlier tests on native RHEL showed something similar.

Common configuration:
7 nodes with 10 GbE interconnect.
Each node: 2 socket Westmere, 96 GB, 10 local SATA disks exported to the VM as JBODs, single 92 GB VM.
TestDFSIO: 140 files, 7143 MB each (about 1 TB total data), so 2 map tasks per disk.  Replication=2.

Case A:  RHEL 5.5, EXT3 file system, write through configured on the physical disk
Case B:  RHEL 6.1, EXT4 FS, write back

Testing with aio-stress shows that the changes made in Case B all improved efficiency and performance.  But running the write test of TestDFSIO on hadoop (using CDH3u0) got worse:

Case A:  580 seconds exec time
Case B:  740 seconds

I can improve Case B to 710 seconds by going back to EXT3, or by mounting EXT4 with min_batch_time=2000, so slowing down the FS improves hadoop performance.

Both cases show a peak write throughput of about 550 MB/s on each node.  The difference is that Case A the throughput is steady and doesn't drop below 500 MB/s, but in B it is very noisy, sometimes going all the way to 0.  It is also sometimes periodic, rising and falling with a 15-30 second period.  That period is synchronized across all the nodes.  550 MB/s appears to be a controller limit, each disk alone is capable of 130 MB/s (with a raw partition or EXT4, EXT3 is about 100 MB/s).  I tried replication=1 to eliminate nearly all networking, but storage throughput was still not steady.

I'm thinking that faster storage somehow confuses the scheduler, but I don't see what the mechanism is.  Any ideas what's going on or things to try?  I don't want to have to recommend de-tuning storage in order to get hadoop to behave.

Thanks for the help,

Jeff
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB