Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: disk used percentage is not symmetric on datanodes (balancer)


+
Tapas Sarangi 2013-03-19, 01:26
+
Bertrand Dechoux 2013-03-18, 21:43
+
Алексей Бабутин 2013-03-22, 16:05
Copy link to this message
-
Re: disk used percentage is not symmetric on datanodes (balancer)
On both types of nodes, what is your dfs.data.dir set to? Does it specify
multiple folders on the same set's of drives or is it 1-1 between folder
and drive?  If it's set to multiple folders on the same drives, it
is probably multiplying the amount of "available capacity" incorrectly in
that it assumes a 1-1 relationship between folder and total capacity of the
drive.
On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <[EMAIL PROTECTED]>wrote:

> Yes, thanks for pointing, but I already know that it is completing the
> balancing when exiting otherwise it shouldn't exit.
> Your answer doesn't solve the problem I mentioned earlier in my message.
> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
> from the cluster even though "df" shows the cluster has about 500 TB of
> free space.
>
> -------
>
>
> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
> [EMAIL PROTECTED]> wrote:
>
>  -setBalancerBandwidth <bandwidth in bytes per second>
>
> So the value is bytes per second. If it is running and exiting,it means it
> has completed the balancing.
>
>
> On 24 March 2013 11:32, Tapas Sarangi <[EMAIL PROTECTED]> wrote:
>
>> Yes, we are running balancer, though a balancer process runs for almost a
>> day or more before exiting and starting over.
>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>> is in Bits then we have a problem.
>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>
>> -----
>>
>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>> [EMAIL PROTECTED]> wrote:
>>
>> Are you running balancer? If balancer is running and if it is slow, try
>> increasing the balancer bandwidth
>>
>>
>> On 24 March 2013 09:21, Tapas Sarangi <[EMAIL PROTECTED]> wrote:
>>
>>> Thanks for the follow up. I don't know whether attachment will pass
>>> through this mailing list, but I am attaching a pdf that contains the usage
>>> of all live nodes.
>>>
>>> All nodes starting with letter "g" are the ones with smaller storage
>>> space where as nodes starting with letter "s" have larger storage space. As
>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>> nodes have a lot of unused space.
>>>
>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>> where it is not able to write any further even though the total space
>>> available in the cluster is about 500 TB. We believe this has something to
>>> do with the way it is balancing the nodes, but don't understand the problem
>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>> going wrong here...
>>>
>>> Thanks
>>> ------
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Balancer know about topology,but when calculate balancing it operates
>>> only with nodes not with racks.
>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>> string 509.
>>>
>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>
>>> For example:
>>> cluster_capacity=3.5Pb
>>> cluster_dfsused=2Pb
>>>
>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>> .Balancer think that all good if  avgutil
>>> +10>node_utilizazation>=avgutil-10.
>>>
>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>
>>> Balancer cant help you.
>>>
>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>>>
>>>
>>>
>>>>
>>>>
>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>>> you will be able to have only 12Tb replication data.
>>>>
>>>>
>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>
>>>>
>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
>>>> with identical capacity.Racks must be identical capacity.
+
see1230 2013-03-25, 03:33
+
Tapas Sarangi 2013-03-24, 20:48
+
Alexey Babutin 2013-03-24, 21:46
+
Tapas Sarangi 2013-03-25, 00:29
+
Alexey Babutin 2013-03-24, 21:50
+
Tapas Sarangi 2013-03-24, 20:44
+
Jamal B 2013-03-24, 21:34
+
Tapas Sarangi 2013-03-24, 23:09
+
Jamal B 2013-03-25, 01:06
+
Tapas Sarangi 2013-03-25, 01:25
+
Jamal B 2013-03-25, 02:09
+
Alexey Babutin 2013-03-24, 21:04
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB