Number of data dirs: 8 Events/Sec Sink Count 1 data dirs 2 data dirs 4 data dirs 6 data dirs 8 data dirs 10 data dirs 1 14.3 k
2 21.9 k
8 24.8 k 43.8 k 72.5 k 77 k 78.6 k 76.6 k 10 58 k 12 49.3 k 49 k
Was looking for sweet spot in perf. So did not take measurements for all data points on grid. Only too for the ones that made sense. For example: when perf dropped by adding more sinks, did not take more measurements for those rows. 2. HDFS Sink:
# of HDFS
BatchSz:1.4mill Sequence File
BatchSz:1.2mill 1 34.3 k 33 k 33 k 2 71 k 75 k 69 k 4 141 k 145 k 141 k 8 271 k 273 k 251 k 12 382 k 380 k 370 k 16 478 k 538 k 486 k
Some simple observations :
* increasing number of dataDirs helps FC perf even on single disk systems * Increasing number of sinks helps * Max throughput observed was about 538k events/sec for HDFS sink which is approx 240MB/s
Done. Please let me know if you run into any issues.
On Wed, Apr 8, 2015 at 3:58 PM, Roshan Naik <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext