Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Re: disk used percentage is not symmetric on datanodes (balancer)


Copy link to this message
-
Re: disk used percentage is not symmetric on datanodes (balancer)
Jamal B 2013-03-25, 02:09
Yes
On Mar 24, 2013 9:25 PM, "Tapas Sarangi" <[EMAIL PROTECTED]> wrote:

> Thanks. Does this need a restart of hadoop in the nodes where this
> modification is made ?
>
> -----
>
> On Mar 24, 2013, at 8:06 PM, Jamal B <[EMAIL PROTECTED]> wrote:
>
> dfs.datanode.du.reserved
>
> You could tweak that param on the smaller nodes to "force" the flow of
> blocks to other nodes.   A short term hack at best, but should help the
> situation a bit.
> On Mar 24, 2013 7:09 PM, "Tapas Sarangi" <[EMAIL PROTECTED]> wrote:
>
>>
>> On Mar 24, 2013, at 4:34 PM, Jamal B <[EMAIL PROTECTED]> wrote:
>>
>> It shouldn't cause further problems since most of your small nodes are
>> already their capacity.  You could set or increase the dfs reserved
>> property on your smaller nodes to force the flow of blocks onto the larger
>> nodes.
>>
>>
>> Thanks.  Can you please specify which are the dfs properties that we can
>> set or modify to force the flow of blocks directed towards the larger nodes
>> than the smaller nodes ?
>>
>> -----
>>
>>
>>
>>
>>
>>
>> On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <[EMAIL PROTECTED]> wrote:
>>
>>> Hi,
>>>
>>> Thanks for the idea, I will give this a try and report back.
>>>
>>> My worry is if we decommission a small node (one at a time), will it
>>> move the data to larger nodes or choke another smaller nodes ? In principle
>>> it should distribute the blocks, the point is it is not distributing the
>>> way we expect it to, so do you think this may cause further problems ?
>>>
>>> ---------
>>>
>>> On Mar 24, 2013, at 3:37 PM, Jamal B <[EMAIL PROTECTED]> wrote:
>>>
>>> Then I think the only way around this would be to decommission  1 at a
>>> time, the smaller nodes, and ensure that the blocks are moved to the larger
>>> nodes.
>>>
>>> And once complete, bring back in the smaller nodes, but maybe only after
>>> you tweak the rack topology to match your disk layout more than network
>>> layout to compensate for the unbalanced nodes.
>>>
>>>
>>> Just my 2 cents
>>>
>>>
>>> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <[EMAIL PROTECTED]>wrote:
>>>
>>>> Thanks. We have a 1-1 configuration of drives and folder in all the
>>>> datanodes.
>>>>
>>>> -Tapas
>>>>
>>>> On Mar 24, 2013, at 3:29 PM, Jamal B <[EMAIL PROTECTED]> wrote:
>>>>
>>>> On both types of nodes, what is your dfs.data.dir set to? Does it
>>>> specify multiple folders on the same set's of drives or is it 1-1 between
>>>> folder and drive?  If it's set to multiple folders on the same drives, it
>>>> is probably multiplying the amount of "available capacity" incorrectly in
>>>> that it assumes a 1-1 relationship between folder and total capacity of the
>>>> drive.
>>>>
>>>>
>>>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <[EMAIL PROTECTED]
>>>> > wrote:
>>>>
>>>>> Yes, thanks for pointing, but I already know that it is completing the
>>>>> balancing when exiting otherwise it shouldn't exit.
>>>>> Your answer doesn't solve the problem I mentioned earlier in my
>>>>> message. 'hdfs' is stalling and hadoop is not writing unless space is
>>>>> cleared up from the cluster even though "df" shows the cluster has about
>>>>> 500 TB of free space.
>>>>>
>>>>> -------
>>>>>
>>>>>
>>>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>>> [EMAIL PROTECTED]> wrote:
>>>>>
>>>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>>>
>>>>> So the value is bytes per second. If it is running and exiting,it
>>>>> means it has completed the balancing.
>>>>>
>>>>>
>>>>> On 24 March 2013 11:32, Tapas Sarangi <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> Yes, we are running balancer, though a balancer process runs for
>>>>>> almost a day or more before exiting and starting over.
>>>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>>>> is in Bits then we have a problem.
>>>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?