Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # dev - Load balancing requests in HDFS


Copy link to this message
-
Re: Load balancing requests in HDFS
Steve Loughran 2011-10-18, 16:37
On 16/10/11 02:53, Bharath Ravi wrote:
> Hi all,
>
> I have a question about how HDFS load balances requests for files/blocks:
>
> HDFS currently distributes data blocks randomly, for balance.
> However, if certain files/blocks are more popular than others, some nodes
> might get an "unfair" number of requests.
> Adding more replicas for these popular files might not help, unless HDFS
> explicitly distributes requests fairly among the replicas.

Have a look at the ReplicationTargetChooser class; it does take datanode
load into account, though it's concern is distribution for data
availability, not performance.

The standard technique for popular files -including MR job JAR files- is
to over-replicate. One problem: how to determine what is popular without
adding more load on the namenode