Ben,
thats defined in ReplicationTargetChooser, first local, 2nd same rack, random. You're right - 50/50 if case one and two does not match.
- Alex
--
Alexander Lorenz
http://mapredit.blogspot.comOn Jan 6, 2012, at 11:56 AM, Ben Clay wrote:
> Alex-
>
> Understood. We do not have a situation that extreme, I was just looking for
> conceptual verification that reads are balanced across replicas of equal
> distance. From the PDF you linked:
>
> "For reading, the name node first checks if the client's computer is located
> in the cluster. If yes, block locations are returned to the client in the
> order of its closeness to the reader. The block is read from data nodes in
> this preference order."
>
> If two datanodes have equal closeness, I'd like to know how the NameNode
> chooses between them.
>
> -Ben
>
> -----Original Message-----
> From: alo.alt [mailto:[EMAIL PROTECTED]]
> Sent: Friday, January 06, 2012 12:45 PM
> To: [EMAIL PROTECTED]
> Subject: Re: HDFS load balancing for non-local reads
>
> Ben,
>
> the scenario should not happen, if one DN has 20 clients and the other zero
> (same block) the cluster (or DN) has another problem. Rack Awareness is
> described here:
>
https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_pr> oposal.pdf
>
> - Alex
>
> --
> Alexander Lorenz
>
http://mapredit.blogspot.com>
> On Jan 5, 2012, at 6:49 PM, Ben Clay wrote:
>
>> Suresh-
>> Thanks for the tips, I'll check those functions out, and examine plugging
> in a different NetworkTopology.
>> So to clarify, under the current scheme, if we have 1 block on two local
> rack nodes A and B, it randomly chooses between those? IE, if DataNode A is
> serving 20 clients and DataNode B is serving 1 client, they both have a 50%
> chance of being selected for the 21st client?
>> -Ben
>>
>> From: Suresh Srinivas [mailto:[EMAIL PROTECTED]]
>> Sent: Thursday, January 05, 2012 5:33 PM
>> To: [EMAIL PROTECTED]
>> Subject: Re: HDFS load balancing for non-local reads
>>
>> Currently it sorts the block locations as:
>> # local node
>> # local rack node
>> # random order of remote nodes
>>
>> See DatanodeManager#sortLocatedBlock(...) and
> NetworkTopology#pseudoSortByDistance(...).
>>
>> You can play around with other policies by plugging in different
> NetworkTopology.
>>
>> On Thu, Jan 5, 2012 at 1:40 PM, Ben Clay <[EMAIL PROTECTED]> wrote:
>> Hi-
>>
>> How does the NameNode handle load balancing of non-local reads with
> multiple block locations when locality is equal?
>>
>> IE, if the client is equidistant (same rack) from 2 DataNodes hosting the
> same block, does the NameNode consider current client count or any other
> load indicators when deciding which DataNode will satisfy the read request?
> Or, is the client provided a list of all split locations and is allowed to
> make this choice themselves?
>>
>> Thanks!
>>
>> -Ben
>>
>
>