Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> Load balancing requests in HDFS


Copy link to this message
-
Re: Load balancing requests in HDFS
On 16/10/11 02:53, Bharath Ravi wrote:
> Hi all,
>
> I have a question about how HDFS load balances requests for files/blocks:
>
> HDFS currently distributes data blocks randomly, for balance.
> However, if certain files/blocks are more popular than others, some nodes
> might get an "unfair" number of requests.
> Adding more replicas for these popular files might not help, unless HDFS
> explicitly distributes requests fairly among the replicas.

Have a look at the ReplicationTargetChooser class; it does take datanode
load into account, though it's concern is distribution for data
availability, not performance.

The standard technique for popular files -including MR job JAR files- is
to over-replicate. One problem: how to determine what is popular without
adding more load on the namenode
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB