Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Re: disk used percentage is not symmetric on datanodes (balancer)


+
Tapas Sarangi 2013-03-19, 01:26
+
Bertrand Dechoux 2013-03-18, 21:43
+
Алексей Бабутин 2013-03-22, 16:05
Copy link to this message
-
Re: disk used percentage is not symmetric on datanodes (balancer)
Jamal B 2013-03-24, 20:29
On both types of nodes, what is your dfs.data.dir set to? Does it specify
multiple folders on the same set's of drives or is it 1-1 between folder
and drive?  If it's set to multiple folders on the same drives, it
is probably multiplying the amount of "available capacity" incorrectly in
that it assumes a 1-1 relationship between folder and total capacity of the
drive.
On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <[EMAIL PROTECTED]>wrote:

> Yes, thanks for pointing, but I already know that it is completing the
> balancing when exiting otherwise it shouldn't exit.
> Your answer doesn't solve the problem I mentioned earlier in my message.
> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
> from the cluster even though "df" shows the cluster has about 500 TB of
> free space.
>
> -------
>
>
> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
> [EMAIL PROTECTED]> wrote:
>
>  -setBalancerBandwidth <bandwidth in bytes per second>
>
> So the value is bytes per second. If it is running and exiting,it means it
> has completed the balancing.
>
>
> On 24 March 2013 11:32, Tapas Sarangi <[EMAIL PROTECTED]> wrote:
>
>> Yes, we are running balancer, though a balancer process runs for almost a
>> day or more before exiting and starting over.
>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>> is in Bits then we have a problem.
>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>
>> -----
>>
>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>> [EMAIL PROTECTED]> wrote:
>>
>> Are you running balancer? If balancer is running and if it is slow, try
>> increasing the balancer bandwidth
>>
>>
>> On 24 March 2013 09:21, Tapas Sarangi <[EMAIL PROTECTED]> wrote:
>>
>>> Thanks for the follow up. I don't know whether attachment will pass
>>> through this mailing list, but I am attaching a pdf that contains the usage
>>> of all live nodes.
>>>
>>> All nodes starting with letter "g" are the ones with smaller storage
>>> space where as nodes starting with letter "s" have larger storage space. As
>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>> nodes have a lot of unused space.
>>>
>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>> where it is not able to write any further even though the total space
>>> available in the cluster is about 500 TB. We believe this has something to
>>> do with the way it is balancing the nodes, but don't understand the problem
>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>> going wrong here...
>>>
>>> Thanks
>>> ------
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Balancer know about topology,but when calculate balancing it operates
>>> only with nodes not with racks.
>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>> string 509.
>>>
>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>
>>> For example:
>>> cluster_capacity=3.5Pb
>>> cluster_dfsused=2Pb
>>>
>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>> .Balancer think that all good if  avgutil
>>> +10>node_utilizazation>=avgutil-10.
>>>
>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>
>>> Balancer cant help you.
>>>
>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>>>
>>>
>>>
>>>>
>>>>
>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>>> you will be able to have only 12Tb replication data.
>>>>
>>>>
>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>
>>>>
>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
>>>> with identical capacity.Racks must be identical capacity.
+
see1230 2013-03-25, 03:33
+
Tapas Sarangi 2013-03-24, 20:48
+
Alexey Babutin 2013-03-24, 21:46
+
Tapas Sarangi 2013-03-25, 00:29
+
Alexey Babutin 2013-03-24, 21:50
+
Tapas Sarangi 2013-03-24, 20:44
+
Jamal B 2013-03-24, 21:34
+
Tapas Sarangi 2013-03-24, 23:09
+
Jamal B 2013-03-25, 01:06
+
Tapas Sarangi 2013-03-25, 01:25
+
Jamal B 2013-03-25, 02:09
+
Alexey Babutin 2013-03-24, 21:04