Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - Re: disk used percentage is not symmetric on datanodes (balancer)


+
Tapas Sarangi 2013-03-18, 21:46
+
Bertrand Dechoux 2013-03-18, 23:17
Copy link to this message
-
Re: disk used percentage is not symmetric on datanodes (balancer)
Tapas Sarangi 2013-03-19, 15:04

On Mar 19, 2013, at 5:00 AM, Алексей Бабутин <[EMAIL PROTECTED]> wrote:

> node A=12TB
> node B=72TB
> How many A nodes  and B from 200 do you have?

We have more number of A nodes than B. The ratio of the number is about 80, 20. Note that not all the B nodes are 72TB, that's a max value. Similarly for A it is a min. value.
 

> If you have more B than A you can deactivate A,clear it and apply again.

Apply what ? It may not be a choice for an active system and it may cripple us for days.

> I suppose that cluster about 3-5 Tb.Run balancer with threshold 0.2 or 0.1.

You meant 3.5 PB, then you are about right.  What this threshold does exactly ? We are not setting the threshold manually, but isn't hadoop's default 0.1 ?

>
> Different servers in one rack is bad idea.You should rebuild cluster with multiple racks.  

Why bad idea ? We are using hadoop as a file system not as a scheduler. How multiple racks are going to help in balancing the disk-usage across datanodes ?

-Tapas
>
> 2013/3/19 Tapas Sarangi <[EMAIL PROTECTED]>
> Hello,
>
> I am using one of the old legacy version (0.20) of hadoop for our cluster. We have scheduled for an upgrade to the newer version within a couple of months, but I would like to understand a couple of things before moving towards the upgrade plan.
>
> We have about 200 datanodes and some of them have larger storage than others. The storage for the datanodes varies between 12 TB to 72 TB.
>
> We found that the disk-used percentage is not symmetric through all the datanodes. For larger storage nodes the percentage of disk-space used is much lower than that of other nodes with smaller storage space. In larger storage nodes the percentage of used disk space varies, but on average about 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is this expected ? If so, then we are not using a lot of the disk space effectively. Is this solved in a future release ?
>
> If no, I would like to know  if there are any checks/debugs that one can do to find an improvement with the current version or upgrading hadoop should solve this problem.
>
> I am happy to provide additional information if needed.
>
> Thanks for any help.
>
> -Tapas
>
>

+
Tapas Sarangi 2013-03-20, 04:12
+
Tapas Sarangi 2013-03-24, 18:32
+
Alexey Babutin 2013-03-25, 14:14