Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: disk used percentage is not symmetric on datanodes (balancer)


+
Tapas Sarangi 2013-03-18, 21:46
+
Bertrand Dechoux 2013-03-18, 23:17
+
Tapas Sarangi 2013-03-19, 15:04
Copy link to this message
-
Re: disk used percentage is not symmetric on datanodes (balancer)
Thanks for the reply. How can I assign a new value to the transfer speed for the balancer ? Is this the parameter, dfs.balance.bandwidthPerSec ?

Where should this go, in conf/hdfs-site.xml ? or conf/core-site.xml  ?

-Tapas

 
On Mar 19, 2013, at 11:05 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> If your balancer does not exit, then it means its heavily working in
> iterations trying to balance your cluster. The default bandwidth
> allows only for limited transfer speed (10 Mbps) to not affect the
> cluster's RW performance while moving blocks between DNs for
> balancing, so the operation may be slow unless you raise the allowed
> bandwidth.
>
> On Wed, Mar 20, 2013 at 7:37 AM, Tapas Sarangi <[EMAIL PROTECTED]> wrote:
>> Any more follow ups ?
>>
>> Thanks
>> -Tapas
>>
>> On Mar 19, 2013, at 9:55 AM, Tapas Sarangi <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> On Mar 18, 2013, at 11:50 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>>
>>>> What do you mean that the balancer is always active?
>>>
>>> meaning, the same process is active for a long time. The process that starts may not be exiting at all. We have a cron job set to run it every 10 minutes, but that's not in effect because the process may never exit.
>>>
>>>
>>>> It is to be used
>>>> as a tool and it exits once it balances in a specific run (loops until
>>>> it does, but always exits at end). The balancer does balance based on
>>>> usage percentage so that is what you're probably looking for/missing.
>>>>
>>>
>>> May be. How does the balancer look for the usage percentage ?
>>>
>>> -Tapas
>>>
>>>
>>>> On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <[EMAIL PROTECTED]> wrote:
>>>>> Hi,
>>>>>
>>>>> On Mar 18, 2013, at 8:21 PM, 李洪忠 <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>> Maybe you need to modify the rackware script to make the rack balance, ie,
>>>>> all the racks are the same size,  on rack by 6 small nodes, one rack by 1
>>>>> large nodes.
>>>>> P.S.
>>>>> you need to reboot the cluster for rackware script modify.
>>>>>
>>>>>
>>>>> Like I mentioned earlier in my reply to Bertrand, we haven't considered rack
>>>>> awareness for the cluster, currently it is considered as just one rack. Can
>>>>> that be the problem ? I don't know…
>>>>>
>>>>> -Tapas
>>>>>
>>>>>
>>>>>
>>>>> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>>>>>
>>>>> And by active, it means that it does actually stops by itself? Else it might
>>>>> mean that the throttling/limit might be an issue with regard to the data
>>>>> volume or velocity.
>>>>>
>>>>> What threshold is used?
>>>>>
>>>>> About the small and big datanodes, how are they distributed with regards to
>>>>> racks?
>>>>> About files, how is used the replication factor(s) and block size(s)?
>>>>>
>>>>> Surely trivial questions again.
>>>>>
>>>>> Bertrand
>>>>>
>>>>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <[EMAIL PROTECTED]>
>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Sorry about that, had it written, but thought it was obvious.
>>>>>> Yes, balancer is active and running on the namenode.
>>>>>>
>>>>>> -Tapas
>>>>>>
>>>>>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <[EMAIL PROTECTED]> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> It is not explicitly said but did you use the balancer?
>>>>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Bertrand
>>>>>>
>>>>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <[EMAIL PROTECTED]>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I am using one of the old legacy version (0.20) of hadoop for our
>>>>>>> cluster. We have scheduled for an upgrade to the newer version within a
>>>>>>> couple of months, but I would like to understand a couple of things before
>>>>>>> moving towards the upgrade plan.
>>>>>>>
>>>>>>> We have about 200 datanodes and some of them have larger storage than
>>>>>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>>>>>>
>>>>>>> We found that the disk-used percentage is not symmetric through all the
+
Tapas Sarangi 2013-03-24, 18:32
+
Alexey Babutin 2013-03-25, 14:14
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB