|
|
-
rack awareness and safemode
Patai Sangbutsarakum 2012-03-20, 20:20
Hadoopers!!
I am going to restart hadoop cluster in order to enable rack-awareness first time. Currently we're running 0.20.203 with 500TB of data on 250+ nodes (without rack-awareness)
I am thinking and afraid that when i start dfs (with rack-awareness enable) the HDFS will be in safemode for hours busy with relocating block to comply with rack-awareness.
Anything knob i can dial to prevent that ?
Thanks in advances Patai
-
Re: rack awareness and safemode
John Meagher 2012-03-20, 20:27
Unless something has changed recently it won't automatically relocate the blocks. When I did something similar I had a script that walked through the whole set of files that were misreplicated and increased the replication factor then dropped it back down. This triggered relocation of blocks to meet the rack requirements.
Doing this worked, but took about a week to run over a few hundred thousand files that were misreplicated.
Here's the script I used (all sorts of caveats about it assuming a replication factor of 3 and no real error handling, etc)...
for f in `hadoop fsck / | grep "Replica placement policy is violated" | head -n80000 | awk -F: '{print $1}'`; do hadoop fs -setrep -w 4 $f hadoop fs -setrep 3 $f done On Tue, Mar 20, 2012 at 16:20, Patai Sangbutsarakum <[EMAIL PROTECTED]> wrote: > Hadoopers!! > > I am going to restart hadoop cluster in order to enable rack-awareness > first time. > Currently we're running 0.20.203 with 500TB of data on 250+ nodes > (without rack-awareness) > > I am thinking and afraid that when i start dfs (with rack-awareness > enable) the HDFS will be in safemode for hours > busy with relocating block to comply with rack-awareness. > > Anything knob i can dial to prevent that ? > > Thanks in advances > Patai
-
Re: rack awareness and safemode
Patai Sangbutsarakum 2012-03-20, 20:38
Thanks for your reply and script. Hopefully it still apply to 0.20.203 As far as I play with test cluster. The balancer would take care of replica placement. I just don't want to fall into the situation that the hdfs sit in the safemode for hours and users can't use hadoop and start yelping.
Let's hear from others. Thanks Patai On 3/20/12 1:27 PM, "John Meagher" <[EMAIL PROTECTED]> wrote:
>ere's the script I used (all sorts of caveats about it assuming a >replication factor of 3 and no real error handling, etc)... > >for f in `hadoop fsck / | grep "Replica placement policy is violated" >| head -n80000 | awk -F: '{print $1}'`; do > hadoop fs -setrep -w 4 $f > hadoop fs -setrep 3 $f >done > >
-
Re: rack awareness and safemode
Harsh J 2012-03-20, 21:44
John has already addressed your concern. I'd only like to add that fixing of replication violations does not require your NN to be in safe mode and it won't be. Your worry can hence be voided :)
On Wed, Mar 21, 2012 at 2:08 AM, Patai Sangbutsarakum <[EMAIL PROTECTED]> wrote: > Thanks for your reply and script. Hopefully it still apply to 0.20.203 > As far as I play with test cluster. The balancer would take care of > replica placement. > I just don't want to fall into the situation that the hdfs sit in the > safemode > for hours and users can't use hadoop and start yelping. > > Let's hear from others. > > > Thanks > Patai > > > On 3/20/12 1:27 PM, "John Meagher" <[EMAIL PROTECTED]> wrote: > >>ere's the script I used (all sorts of caveats about it assuming a >>replication factor of 3 and no real error handling, etc)... >> >>for f in `hadoop fsck / | grep "Replica placement policy is violated" >>| head -n80000 | awk -F: '{print $1}'`; do >> hadoop fs -setrep -w 4 $f >> hadoop fs -setrep 3 $f >>done >> >> >
-- Harsh J
-
Re: rack awareness and safemode
Patai Sangbutsarakum 2012-03-20, 23:19
Thanks you all. On Tue, Mar 20, 2012 at 2:44 PM, Harsh J <[EMAIL PROTECTED]> wrote: > John has already addressed your concern. I'd only like to add that > fixing of replication violations does not require your NN to be in > safe mode and it won't be. Your worry can hence be voided :) > > On Wed, Mar 21, 2012 at 2:08 AM, Patai Sangbutsarakum > <[EMAIL PROTECTED]> wrote: >> Thanks for your reply and script. Hopefully it still apply to 0.20.203 >> As far as I play with test cluster. The balancer would take care of >> replica placement. >> I just don't want to fall into the situation that the hdfs sit in the >> safemode >> for hours and users can't use hadoop and start yelping. >> >> Let's hear from others. >> >> >> Thanks >> Patai >> >> >> On 3/20/12 1:27 PM, "John Meagher" <[EMAIL PROTECTED]> wrote: >> >>>ere's the script I used (all sorts of caveats about it assuming a >>>replication factor of 3 and no real error handling, etc)... >>> >>>for f in `hadoop fsck / | grep "Replica placement policy is violated" >>>| head -n80000 | awk -F: '{print $1}'`; do >>> hadoop fs -setrep -w 4 $f >>> hadoop fs -setrep 3 $f >>>done >>> >>> >> > > > > -- > Harsh J
-
Re: rack awareness and safemode
Patai Sangbutsarakum 2012-03-22, 17:36
I restarted the cluster yesterday with rack-awareness enable. Things went well. confirm that there was no issues at all.
Thanks you all again. On Tue, Mar 20, 2012 at 4:19 PM, Patai Sangbutsarakum <[EMAIL PROTECTED]> wrote: > Thanks you all. > > > On Tue, Mar 20, 2012 at 2:44 PM, Harsh J <[EMAIL PROTECTED]> wrote: >> John has already addressed your concern. I'd only like to add that >> fixing of replication violations does not require your NN to be in >> safe mode and it won't be. Your worry can hence be voided :) >> >> On Wed, Mar 21, 2012 at 2:08 AM, Patai Sangbutsarakum >> <[EMAIL PROTECTED]> wrote: >>> Thanks for your reply and script. Hopefully it still apply to 0.20.203 >>> As far as I play with test cluster. The balancer would take care of >>> replica placement. >>> I just don't want to fall into the situation that the hdfs sit in the >>> safemode >>> for hours and users can't use hadoop and start yelping. >>> >>> Let's hear from others. >>> >>> >>> Thanks >>> Patai >>> >>> >>> On 3/20/12 1:27 PM, "John Meagher" <[EMAIL PROTECTED]> wrote: >>> >>>>ere's the script I used (all sorts of caveats about it assuming a >>>>replication factor of 3 and no real error handling, etc)... >>>> >>>>for f in `hadoop fsck / | grep "Replica placement policy is violated" >>>>| head -n80000 | awk -F: '{print $1}'`; do >>>> hadoop fs -setrep -w 4 $f >>>> hadoop fs -setrep 3 $f >>>>done >>>> >>>> >>> >> >> >> >> -- >> Harsh J
-
Re: rack awareness and safemode
John Meagher 2012-03-22, 17:40
Make sure you run "hadoop fsck /". It should report a lot of blocks with the replication policy violated. In the sort term it isn't anything to worry about and everything will work fine even with those errors. Run the script I sent out earlier to fix those errors and bring everything into compliance with the new rack awareness setup. On Thu, Mar 22, 2012 at 13:36, Patai Sangbutsarakum <[EMAIL PROTECTED]> wrote: > I restarted the cluster yesterday with rack-awareness enable. > Things went well. confirm that there was no issues at all. > > Thanks you all again. > > > On Tue, Mar 20, 2012 at 4:19 PM, Patai Sangbutsarakum > <[EMAIL PROTECTED]> wrote: >> Thanks you all. >> >> >> On Tue, Mar 20, 2012 at 2:44 PM, Harsh J <[EMAIL PROTECTED]> wrote: >>> John has already addressed your concern. I'd only like to add that >>> fixing of replication violations does not require your NN to be in >>> safe mode and it won't be. Your worry can hence be voided :) >>> >>> On Wed, Mar 21, 2012 at 2:08 AM, Patai Sangbutsarakum >>> <[EMAIL PROTECTED]> wrote: >>>> Thanks for your reply and script. Hopefully it still apply to 0.20.203 >>>> As far as I play with test cluster. The balancer would take care of >>>> replica placement. >>>> I just don't want to fall into the situation that the hdfs sit in the >>>> safemode >>>> for hours and users can't use hadoop and start yelping. >>>> >>>> Let's hear from others. >>>> >>>> >>>> Thanks >>>> Patai >>>> >>>> >>>> On 3/20/12 1:27 PM, "John Meagher" <[EMAIL PROTECTED]> wrote: >>>> >>>>>ere's the script I used (all sorts of caveats about it assuming a >>>>>replication factor of 3 and no real error handling, etc)... >>>>> >>>>>for f in `hadoop fsck / | grep "Replica placement policy is violated" >>>>>| head -n80000 | awk -F: '{print $1}'`; do >>>>> hadoop fs -setrep -w 4 $f >>>>> hadoop fs -setrep 3 $f >>>>>done >>>>> >>>>> >>>> >>> >>> >>> >>> -- >>> Harsh J
-
Re: rack awareness and safemode
Patai Sangbutsarakum 2012-03-22, 17:55
Roger that
On Thu, Mar 22, 2012 at 10:40 AM, John Meagher <[EMAIL PROTECTED]> wrote: > Make sure you run "hadoop fsck /". It should report a lot of blocks > with the replication policy violated. In the sort term it isn't > anything to worry about and everything will work fine even with those > errors. Run the script I sent out earlier to fix those errors and > bring everything into compliance with the new rack awareness setup. > > > On Thu, Mar 22, 2012 at 13:36, Patai Sangbutsarakum > <[EMAIL PROTECTED]> wrote: >> I restarted the cluster yesterday with rack-awareness enable. >> Things went well. confirm that there was no issues at all. >> >> Thanks you all again. >> >> >> On Tue, Mar 20, 2012 at 4:19 PM, Patai Sangbutsarakum >> <[EMAIL PROTECTED]> wrote: >>> Thanks you all. >>> >>> >>> On Tue, Mar 20, 2012 at 2:44 PM, Harsh J <[EMAIL PROTECTED]> wrote: >>>> John has already addressed your concern. I'd only like to add that >>>> fixing of replication violations does not require your NN to be in >>>> safe mode and it won't be. Your worry can hence be voided :) >>>> >>>> On Wed, Mar 21, 2012 at 2:08 AM, Patai Sangbutsarakum >>>> <[EMAIL PROTECTED]> wrote: >>>>> Thanks for your reply and script. Hopefully it still apply to 0.20.203 >>>>> As far as I play with test cluster. The balancer would take care of >>>>> replica placement. >>>>> I just don't want to fall into the situation that the hdfs sit in the >>>>> safemode >>>>> for hours and users can't use hadoop and start yelping. >>>>> >>>>> Let's hear from others. >>>>> >>>>> >>>>> Thanks >>>>> Patai >>>>> >>>>> >>>>> On 3/20/12 1:27 PM, "John Meagher" <[EMAIL PROTECTED]> wrote: >>>>> >>>>>>ere's the script I used (all sorts of caveats about it assuming a >>>>>>replication factor of 3 and no real error handling, etc)... >>>>>> >>>>>>for f in `hadoop fsck / | grep "Replica placement policy is violated" >>>>>>| head -n80000 | awk -F: '{print $1}'`; do >>>>>> hadoop fs -setrep -w 4 $f >>>>>> hadoop fs -setrep 3 $f >>>>>>done >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Harsh J
|
|