Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Online snapshots progress.


Copy link to this message
-
Re: Online snapshots progress.
Loving the extensive testing Jon - good stuff.
Basically, there are two meta reads -- once to get the list of servers
> involved, and once after the snapshot is taken to verify that all regions
> in the snapshot matchup with the snapshots in meta at that point in time.
>
> I believe moves/balances when snapshot is going will cause some rs's to
> potentially be missed, and that and spilts may make regions new regions
> appear in meta that do not exist in a just taken snapshot and thus cause
> the snapshot verifier to reject the snapshot.
>

Yeah, that's the right intuition, as long as nothing has really changed in
the code, from what I remember :)

-------------------
Jesse Yates
@jesse_yates
jyates.github.com

On Fri, Dec 14, 2012 at 10:08 AM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote:

>
>
> Jon.
>
> On Fri, Dec 14, 2012 at 9:37 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > Thanks for the update, Jon.
> >
> > bq. if splits or balancing occurs while a snapshotting, the region moves
> > cause the final snapshot verification step to abort
> >
> > The split or balancing happened during snapshot verification step, right
> ?
> >
> > On Fri, Dec 14, 2012 at 9:17 AM, Jonathan Hsieh <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hey folks,
> > >
> > > I've been testing and finding bugs on a branch of online snapshots for
> > the
> > > past few days. The good news is that taking an online snapshot seems to
> > be
> > > fairly robust -- I've been taking online-snapshots as quickly as
> possible
> > > on a 5 node cluster being battered by a performance eval random write
> > run.
> > >
> > >
> > > As expected we ran into some hiccups. In my last run of the
> > > PE/online-snapshotting, it looks like 88/100 snapshots succeeded. This
> is
> > > ok, some failures are actually expected (the first cut only claims
> better
> > > consistency than 'copytable' and 'only-on-a-sunny-day' semantics).
> From a
> > > quick viewing of what cause the failed cases, if splits or balancing
> > > occurs while a snapshotting, the region moves cause the final snapshot
> > > verification step to abort because we look for the new regions and
> don't
> > > know if we have all regions.  We've also found some problems with
> splits
> > of
> > > hfilelinks (HBASE-7339), and we've encountered an occasional
> failed-hang
> > > clone attempts (HBASE-7352), and an occasional ZK related slow abort.
>  As
> > > they are found and characterized,  I've been filing them under
> HBASE-6055
> > > (offline-snapshots) or HBASE-7290 (online-snapshots).
> > >
> > > I'm going to switch from bug fixing mode back to patch polishing mode
> > today
> > > to get some of this committed to the snapshot dev branch.  Here's how I
> > > hope to deal with them moving forward.
> > >
> > > I'll be polishing the pieces I've been testing (there are about 5-7
> > patches
> > > in-flight currently) and putting updated pieces up for review.  There
> is
> > > non-trivial overhead maintaining this many patches "in the future".
> > Since
> > > this is a dev-branch, I'm going to ask reviewing these initial big
> > > dev-branch reviews focus on understandability and that your +1's would
> > let
> > > us punt to follow-on jiras and TODOs more frequently than if you were
> > > reviewing for trunk.  The sooner we get the skeleton in,  the easier
> > > collaboration with other folks working and testing the same branch.
> > >  Ideally, getting the large pieces in would allow follow-ons to be
> easier
> > > to review and tackle.  The promise here, of course, is that many of
> >  these
> > > follow-on jiras, bugs (deadlocks, hangs), and testing evidence will be
> > > blockers before merging to offline snapshots to trunk and merging
> online
> > > snapshots to trunk.
> > >
> > > Sound good?
> > >
> > > We've initially had one snapshot branch (offline snapshots) but I'm
> > > proposing having two: the offline-snapshot branch and the
> online-snapshot
> > > branch.  Jesse's been the master of the offline branch and pushing
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB