-Re: disk used percentage is not symmetric on datanodes (balancer)
And by active, it means that it does actually stops by itself? Else it
might mean that the throttling/limit might be an issue with regard to the
data volume or velocity.
What threshold is used?
About the small and big datanodes, how are they distributed with regards to
About files, how is used the replication factor(s) and block size(s)?
Surely trivial questions again.
On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <[EMAIL PROTECTED]>wrote:
> Sorry about that, had it written, but thought it was obvious.
> Yes, balancer is active and running on the namenode.
> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <[EMAIL PROTECTED]> wrote:
> It is not explicitly said but did you use the balancer?
> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <[EMAIL PROTECTED]>wrote:
>> I am using one of the old legacy version (0.20) of hadoop for our
>> cluster. We have scheduled for an upgrade to the newer version within a
>> couple of months, but I would like to understand a couple of things before
>> moving towards the upgrade plan.
>> We have about 200 datanodes and some of them have larger storage than
>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>> We found that the disk-used percentage is not symmetric through all the
>> datanodes. For larger storage nodes the percentage of disk-space used is
>> much lower than that of other nodes with smaller storage space. In larger
>> storage nodes the percentage of used disk space varies, but on average
>> about 30-50%. For the smaller storage nodes this number is as high as
>> 99.9%. Is this expected ? If so, then we are not using a lot of the disk
>> space effectively. Is this solved in a future release ?
>> If no, I would like to know if there are any checks/debugs that one can
>> do to find an improvement with the current version or upgrading hadoop
>> should solve this problem.
>> I am happy to provide additional information if needed.
>> Thanks for any help.