Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Online snapshots progress.


Copy link to this message
-
Re: Online snapshots progress.
Jesse Yates 2012-12-14, 18:11
Loving the extensive testing Jon - good stuff.
Basically, there are two meta reads -- once to get the list of servers
> involved, and once after the snapshot is taken to verify that all regions
> in the snapshot matchup with the snapshots in meta at that point in time.
>
> I believe moves/balances when snapshot is going will cause some rs's to
> potentially be missed, and that and spilts may make regions new regions
> appear in meta that do not exist in a just taken snapshot and thus cause
> the snapshot verifier to reject the snapshot.
>

Yeah, that's the right intuition, as long as nothing has really changed in
the code, from what I remember :)

-------------------
Jesse Yates
@jesse_yates
jyates.github.com

On Fri, Dec 14, 2012 at 10:08 AM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote:

>
>
> Jon.
>
> On Fri, Dec 14, 2012 at 9:37 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > Thanks for the update, Jon.
> >
> > bq. if splits or balancing occurs while a snapshotting, the region moves
> > cause the final snapshot verification step to abort
> >
> > The split or balancing happened during snapshot verification step, right
> ?
> >
> > On Fri, Dec 14, 2012 at 9:17 AM, Jonathan Hsieh <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hey folks,
> > >
> > > I've been testing and finding bugs on a branch of online snapshots for
> > the
> > > past few days. The good news is that taking an online snapshot seems to
> > be
> > > fairly robust -- I've been taking online-snapshots as quickly as
> possible
> > > on a 5 node cluster being battered by a performance eval random write
> > run.
> > >
> > >
> > > As expected we ran into some hiccups. In my last run of the
> > > PE/online-snapshotting, it looks like 88/100 snapshots succeeded. This
> is
> > > ok, some failures are actually expected (the first cut only claims
> better
> > > consistency than 'copytable' and 'only-on-a-sunny-day' semantics).
> From a
> > > quick viewing of what cause the failed cases, if splits or balancing
> > > occurs while a snapshotting, the region moves cause the final snapshot
> > > verification step to abort because we look for the new regions and
> don't
> > > know if we have all regions.  We've also found some problems with
> splits
> > of
> > > hfilelinks (HBASE-7339), and we've encountered an occasional
> failed-hang
> > > clone attempts (HBASE-7352), and an occasional ZK related slow abort.
>  As
> > > they are found and characterized,  I've been filing them under
> HBASE-6055
> > > (offline-snapshots) or HBASE-7290 (online-snapshots).
> > >
> > > I'm going to switch from bug fixing mode back to patch polishing mode
> > today
> > > to get some of this committed to the snapshot dev branch.  Here's how I
> > > hope to deal with them moving forward.
> > >
> > > I'll be polishing the pieces I've been testing (there are about 5-7
> > patches
> > > in-flight currently) and putting updated pieces up for review.  There
> is
> > > non-trivial overhead maintaining this many patches "in the future".
> > Since
> > > this is a dev-branch, I'm going to ask reviewing these initial big
> > > dev-branch reviews focus on understandability and that your +1's would
> > let
> > > us punt to follow-on jiras and TODOs more frequently than if you were
> > > reviewing for trunk.  The sooner we get the skeleton in,  the easier
> > > collaboration with other folks working and testing the same branch.
> > >  Ideally, getting the large pieces in would allow follow-ons to be
> easier
> > > to review and tackle.  The promise here, of course, is that many of
> >  these
> > > follow-on jiras, bugs (deadlocks, hangs), and testing evidence will be
> > > blockers before merging to offline snapshots to trunk and merging
> online
> > > snapshots to trunk.
> > >
> > > Sound good?
> > >
> > > We've initially had one snapshot branch (offline snapshots) but I'm
> > > proposing having two: the offline-snapshot branch and the
> online-snapshot
> > > branch.  Jesse's been the master of the offline branch and pushing