Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Review Request 15650: ACCUMULO-1794 adds hdfs failover to continuous integration test.


Copy link to this message
-
Re: Review Request 15650: ACCUMULO-1794 adds hdfs failover to continuous integration test.
On Wed, Nov 20, 2013 at 11:22 AM, Sean Busbey <[EMAIL PROTECTED]> wrote:

>
>
> > On Nov. 20, 2013, 4:16 p.m., kturner wrote:
> > > test/system/continuous/hdfs-agitator.pl, line 104
> > > <
> https://reviews.apache.org/r/15650/diff/1/?file=388001#file388001line104>
> > >
> > >     What are the pros and cons of using this haadmin command vs
> killing namenode processes?
>
> Pro haadmin:
>
> * The underlying HDFS instance may not be configured for automatic
> failover.
> * The haadmin command doesn't require knowing where the NameNode processes
> are running within the cluster.
> * The haadmin tool is a publicly exposed way of saying "do a failover",
> whereas finding the NameNode to kill will be a heuristic.
>
> Pro killing namenode:
>
> * If you specifically need to test what happens when it's the automatic
> failover process kicking in
>
> Note that I don't think the pro-killing pro is that strong of a pro. The
> haadmin command still needs to transition the active to standby and then
> the standby to active, so systems above HDFS are going to already encounter
> e.g. gaps in there being an active namenode.
>
>
>
Review board is not working so responding here on the dev list.  I suspect
killing the processes would yield slightly more realistic test results, but
it certainly makes our scripts more unwieldy.  Maybe a better way to do
this it to work towards moving hdfs agitation into hdfs itself.

Taking things a bit further, killing processes is not as effective in test
as really killing machines (because of it does not expose issues like
unflushed data in OS caches).

On to another issue.  Does the script ever kill all ha namnodes?   Is this
possible w/ haadmin?
> - Sean
>
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15650/#review29167
> -----------------------------------------------------------
>
>
> On Nov. 18, 2013, 5:13 p.m., Sean Busbey wrote:
> >
> > -----------------------------------------------------------
> > This is an automatically generated e-mail. To reply, visit:
> > https://reviews.apache.org/r/15650/
> > -----------------------------------------------------------
> >
> > (Updated Nov. 18, 2013, 5:13 p.m.)
> >
> >
> > Review request for accumulo and Alex Moundalexis.
> >
> >
> > Bugs: ACCUMULO-1794
> >     https://issues.apache.org/jira/browse/ACCUMULO-1794
> >
> >
> > Repository: accumulo
> >
> >
> > Description
> > -------
> >
> > ACCUMULO-1794 adds hdfs failover to continuous integration test.
> >
> >
> > Diffs
> > -----
> >
> >   test/system/continuous/continuous-env.sh.example
> 830ae86b5bf2398a840b853423755f6dd65f2dc0
> >   test/system/continuous/hdfs-agitator.pl PRE-CREATION
> >   test/system/continuous/start-agitator.sh
> 52e5a4e82a4564fa624a71f73ad29fa20ba23246
> >   test/system/continuous/stop-agitator.sh
> b853a55b12f8402606af52e0748ca50daf95ed7f
> >
> > Diff: https://reviews.apache.org/r/15650/diff/
> >
> >
> > Testing
> > -------
> >
> > Ran the hdfs agitator on a CDH4 cluster configured for HA. it
> successfully caused the active namenode to failover as it went.
> >
> >
> > Thanks,
> >
> > Sean Busbey
> >
> >
>
>