Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - balance blocks between small and bigger disks in the same datanode.


Copy link to this message
-
Re: balance blocks between small and bigger disks in the same datanode.
Patai Sangbutsarakum 2011-10-26, 22:33
Sorry for late big "Thank you", Harsh..

>You shouldn't be running into write errors with one full
> disk mount, as it will automatically be unselected for writes.
>

This gives me a big peace of mind.

Regards,
P
On Tue, Oct 25, 2011 at 10:42 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> Hi,
>
> The block writing mechanism does pay heed to remaining free space while
> choosing the disk. You shouldn't be running into write errors with one full
> disk mount, as it will automatically be unselected for writes.
>
> The free space measurement also takes the reservation property into account,
> which you have mentioned.
>
> And yes, I guess it could be better to decommission and recommission+balance
> if you can afford the time.
>
> The writes are round robin in nature, in terms of disk selection, btw.
>
> On Tuesday, October 25, 2011, Patai Sangbutsarakum <[EMAIL PROTECTED]>
> wrote:
>> Good morning Harsh,
>> Thanks for late night reply ;-)
>>
>>>> Quick q: were some disks added later, as part of this datanode?
>> there is no new disks added.. i just planned to load off data blk from
>> that small partition to other bigger partitions,
>> but seem to me that bring down 130 nodes just for moving blk is sth
>> need to seriously considered, and later on
>> if i ran rebalance, /hadoop1 will be filled back again.
>>
>> Is there anyway to tell hadoop to stop using _a partition_ once free
>> space of a partition hit certain limit ?
>>
>> as far as I researched, it point to "dfs.datanode.du.reserved" which
>> in this case if i put dfs.datanode.du.reserved = (33G in byte)
>>
>> DFS still continue using /hadoop2, /hadoop3... but not fill more blk
>> on /hadoop1?
>>
>> Please suggest,
>> -Patai
>>
>>
>>
>> On Tue, Oct 25, 2011 at 1:49 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>> Patai,
>>>
>>> 1. HDFS as the whole service.
>>> 2.1. Yes.
>>> 2.2. Yes, the directory parent must be current.
>>> 2.3. Yes you can move the whole subdirectory.
>>>
>>> Quick q: were some disks added later, as part of this datanode?
>>>
>>> On Tuesday, October 25, 2011, Patai Sangbutsarakum <
> [EMAIL PROTECTED]>
>>> wrote:
>>>> Hi All,
>>>>
>>>> I was looking into FAQ, but well still have questions.
>>>> Datanodes in my production are running low in the space of one of
>>> dfs.data.dir
>>>>
>>>>
>>>> /dev/sda5             --> 355G   322G    33G  91% /hadoop1  <----
>>>> /dev/sdb1             --> 484G   324G   161G  67% /hadoop2
>>>> /dev/sdc1                   484G   318G   167G  66% /hadoop3
>>>>
>>>> /hadoop1 has smaller space since the very beginning because its drive
>>>> is being shared with operating system.
>>>> I found one FAQ in wiki page
>>>> "3.12. On an individual data node, how do you balance the blocks on the
>>> disk?
>>>>
>>>> Hadoop currently does not have a method by which to do this
>>>> automatically. To do this manually:
>>>>
>>>> 1    Take down the HDFS
>>>> 2   Use the UNIX mv command to move the individual blocks and meta
>>>> pairs from one directory to another on each host
>>>> 3    Restart the HDFS "
>>>>
>>>>
>>>> Question of step 1, take down the hdfs.
>>>> does that mean the whole cluster OR just datanode process of a
>>>> datanode/tasktracker host?
>>>>
>>>> Question of step 2,
>>>>
>>>> 2.1 "moving blk and meta pair."
>>>>
>>>> are blk and meta pairs referring to
>>>>
>>>> cd /hadoop1/data/current
>>>> $ ls -al *8816473533602921489*
>>>> -rw-rw-r-- 1 apps apps 1734467 Aug 27 21:03 blk_-8816473533602921489
>>>> -rw-rw-r-- 1 apps apps      63 Aug 27 21:03
>>>> blk_-8816473533602921489_78445781.meta
>>>>
>>>> ???
>>>>
>>>> 2.2 "from one directory to another on each host"
>>>>
>>>> does it needs to be like blk(and meta) from "current" has to be landed
>>>> to "current" directory of another dfs.data.dir
>>>> mv /hadoop1/data/current/*8816473533602921489* /hadoop2/data/current/
>>>>
>>>> or it can be different directory name in destination side.
>>>>
>>>>
>>>> 2.3 how about subdirXX?
>>>>
>>>> under /hadoop1/data/current/
>>>> ....
>