Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> balance blocks between small and bigger disks in the same datanode.


Copy link to this message
-
Re: balance blocks between small and bigger disks in the same datanode.
Sorry for late big "Thank you", Harsh..

>You shouldn't be running into write errors with one full
> disk mount, as it will automatically be unselected for writes.
>

This gives me a big peace of mind.

Regards,
P
On Tue, Oct 25, 2011 at 10:42 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> Hi,
>
> The block writing mechanism does pay heed to remaining free space while
> choosing the disk. You shouldn't be running into write errors with one full
> disk mount, as it will automatically be unselected for writes.
>
> The free space measurement also takes the reservation property into account,
> which you have mentioned.
>
> And yes, I guess it could be better to decommission and recommission+balance
> if you can afford the time.
>
> The writes are round robin in nature, in terms of disk selection, btw.
>
> On Tuesday, October 25, 2011, Patai Sangbutsarakum <[EMAIL PROTECTED]>
> wrote:
>> Good morning Harsh,
>> Thanks for late night reply ;-)
>>
>>>> Quick q: were some disks added later, as part of this datanode?
>> there is no new disks added.. i just planned to load off data blk from
>> that small partition to other bigger partitions,
>> but seem to me that bring down 130 nodes just for moving blk is sth
>> need to seriously considered, and later on
>> if i ran rebalance, /hadoop1 will be filled back again.
>>
>> Is there anyway to tell hadoop to stop using _a partition_ once free
>> space of a partition hit certain limit ?
>>
>> as far as I researched, it point to "dfs.datanode.du.reserved" which
>> in this case if i put dfs.datanode.du.reserved = (33G in byte)
>>
>> DFS still continue using /hadoop2, /hadoop3... but not fill more blk
>> on /hadoop1?
>>
>> Please suggest,
>> -Patai
>>
>>
>>
>> On Tue, Oct 25, 2011 at 1:49 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>> Patai,
>>>
>>> 1. HDFS as the whole service.
>>> 2.1. Yes.
>>> 2.2. Yes, the directory parent must be current.
>>> 2.3. Yes you can move the whole subdirectory.
>>>
>>> Quick q: were some disks added later, as part of this datanode?
>>>
>>> On Tuesday, October 25, 2011, Patai Sangbutsarakum <
> [EMAIL PROTECTED]>
>>> wrote:
>>>> Hi All,
>>>>
>>>> I was looking into FAQ, but well still have questions.
>>>> Datanodes in my production are running low in the space of one of
>>> dfs.data.dir
>>>>
>>>>
>>>> /dev/sda5             --> 355G   322G    33G  91% /hadoop1  <----
>>>> /dev/sdb1             --> 484G   324G   161G  67% /hadoop2
>>>> /dev/sdc1                   484G   318G   167G  66% /hadoop3
>>>>
>>>> /hadoop1 has smaller space since the very beginning because its drive
>>>> is being shared with operating system.
>>>> I found one FAQ in wiki page
>>>> "3.12. On an individual data node, how do you balance the blocks on the
>>> disk?
>>>>
>>>> Hadoop currently does not have a method by which to do this
>>>> automatically. To do this manually:
>>>>
>>>> 1    Take down the HDFS
>>>> 2   Use the UNIX mv command to move the individual blocks and meta
>>>> pairs from one directory to another on each host
>>>> 3    Restart the HDFS "
>>>>
>>>>
>>>> Question of step 1, take down the hdfs.
>>>> does that mean the whole cluster OR just datanode process of a
>>>> datanode/tasktracker host?
>>>>
>>>> Question of step 2,
>>>>
>>>> 2.1 "moving blk and meta pair."
>>>>
>>>> are blk and meta pairs referring to
>>>>
>>>> cd /hadoop1/data/current
>>>> $ ls -al *8816473533602921489*
>>>> -rw-rw-r-- 1 apps apps 1734467 Aug 27 21:03 blk_-8816473533602921489
>>>> -rw-rw-r-- 1 apps apps      63 Aug 27 21:03
>>>> blk_-8816473533602921489_78445781.meta
>>>>
>>>> ???
>>>>
>>>> 2.2 "from one directory to another on each host"
>>>>
>>>> does it needs to be like blk(and meta) from "current" has to be landed
>>>> to "current" directory of another dfs.data.dir
>>>> mv /hadoop1/data/current/*8816473533602921489* /hadoop2/data/current/
>>>>
>>>> or it can be different directory name in destination side.
>>>>
>>>>
>>>> 2.3 how about subdirXX?
>>>>
>>>> under /hadoop1/data/current/
>>>> ....
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB