-Re: balance blocks between small and bigger disks in the same datanode.
Patai Sangbutsarakum 2011-10-25, 17:06
Good morning Harsh,
Thanks for late night reply ;-)
>> Quick q: were some disks added later, as part of this datanode?
there is no new disks added.. i just planned to load off data blk from
that small partition to other bigger partitions,
but seem to me that bring down 130 nodes just for moving blk is sth
need to seriously considered, and later on
if i ran rebalance, /hadoop1 will be filled back again.
Is there anyway to tell hadoop to stop using _a partition_ once free
space of a partition hit certain limit ?
as far as I researched, it point to "dfs.datanode.du.reserved" which
in this case if i put dfs.datanode.du.reserved = (33G in byte)
DFS still continue using /hadoop2, /hadoop3... but not fill more blk
On Tue, Oct 25, 2011 at 1:49 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> 1. HDFS as the whole service.
> 2.1. Yes.
> 2.2. Yes, the directory parent must be current.
> 2.3. Yes you can move the whole subdirectory.
> Quick q: were some disks added later, as part of this datanode?
> On Tuesday, October 25, 2011, Patai Sangbutsarakum <[EMAIL PROTECTED]>
>> Hi All,
>> I was looking into FAQ, but well still have questions.
>> Datanodes in my production are running low in the space of one of
>> /dev/sda5 --> 355G 322G 33G 91% /hadoop1 <----
>> /dev/sdb1 --> 484G 324G 161G 67% /hadoop2
>> /dev/sdc1 484G 318G 167G 66% /hadoop3
>> /hadoop1 has smaller space since the very beginning because its drive
>> is being shared with operating system.
>> I found one FAQ in wiki page
>> "3.12. On an individual data node, how do you balance the blocks on the
>> Hadoop currently does not have a method by which to do this
>> automatically. To do this manually:
>> 1 Take down the HDFS
>> 2 Use the UNIX mv command to move the individual blocks and meta
>> pairs from one directory to another on each host
>> 3 Restart the HDFS "
>> Question of step 1, take down the hdfs.
>> does that mean the whole cluster OR just datanode process of a
>> datanode/tasktracker host?
>> Question of step 2,
>> 2.1 "moving blk and meta pair."
>> are blk and meta pairs referring to
>> cd /hadoop1/data/current
>> $ ls -al *8816473533602921489*
>> -rw-rw-r-- 1 apps apps 1734467 Aug 27 21:03 blk_-8816473533602921489
>> -rw-rw-r-- 1 apps apps 63 Aug 27 21:03
>> 2.2 "from one directory to another on each host"
>> does it needs to be like blk(and meta) from "current" has to be landed
>> to "current" directory of another dfs.data.dir
>> mv /hadoop1/data/current/*8816473533602921489* /hadoop2/data/current/
>> or it can be different directory name in destination side.
>> 2.3 how about subdirXX?
>> under /hadoop1/data/current/
>> 55G subdir36
>> 49G subdir37
>> it is so tempting to move subdir36, subdir37 because they are huge.
>> should it look like
>> mv /hadoop1/data/current/subdir36/* /hadoop2/data/current/subdir36/
>> well... under /hadoop2/data/current/subdir36/
>> also have bunch of blk(and meta) and bunch of subdirectories as well
>> which mean if i do move, it might be some collide ?
>> Thanks in advances.
> Harsh J