Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Hadoop throughput question


Copy link to this message
-
RE: Hadoop throughput question
Let's suppose you are doing a read-intensive job like, for example, counting records.  This is will be disk bandwidth limited.  On a 4-node cluster with 2 local SATA on each node you should easily read 400MB/sec in aggregate.  When you are running the Hadoop cluster, is the Hadoop processing co-located with the Ilsilon nodes?  Is Hadoop configured to use OneFS or HDFS?
John

From: Artem Ervits [mailto:[EMAIL PROTECTED]]
Sent: Thursday, January 03, 2013 3:00 PM
To: [EMAIL PROTECTED]
Subject: Hadoop throughput question

Hello all,

I'd like to pick the community brain on average throughput speeds for a moderately specced 4-node Hadoop cluster with 1GigE networking. Is it reasonable to expect constant average speeds of 150-200mb/sec on such setup? Forgive me if the question is loaded but we're Hadoop cluster with HDFS served via EMC Isilon storage. We're getting about 30mb/sec with our machines and we do not see a difference in job speed between 2 node cluster and 4 node cluster.

Thank you.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB