Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Recovering corrupt HLog files


Copy link to this message
-
Re: Recovering corrupt HLog files
Bryan Beaudreault 2012-07-01, 05:07
Thanks all for the additional input.  I do not think the HLogs are
corrupted any longer, at least I think it was because we had also lost a
good portion of data nodes.  We have since recovered all the datanodes, so
they are good.

We will look in to creating an executable jar out of your WALPlayer class
if it comes to that.  Right now we are happy that the two systems powered
by this were engineered accordingly: idempotent, fail-fast hadoop jobs for
one, and using bookeeper WAL for the other.

On Sat, Jun 30, 2012 at 11:43 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> I added it in HBase 0.94, but the code is pretty isolated and could be
> easily (I think) ported to HBase 0.90.
> Could potentially be just turned into a separate jar file.
>
>
> As for whether it'll do the write thing... It uses the timestamps provided
> solely to pick the right set of HLog instead of playing them all.
> But since all operations (except Increment/Append) are idempotent with
> timestamp, playing them again has no effect (i.e. your newer versions will
> still be visible, since they have a newer timestamp).
>
> As these files are corrupt there's no way way of knowing how far WALPlayer
> will get in writing them. It will at least play until the first corruption
> identified.
>
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: Bryan Beaudreault <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Cc:
> Sent: Saturday, June 30, 2012 12:29 PM
> Subject: Re: Recovering corrupt HLog files
>
> I should have mentioned in my initial email that I am operating on HBase
> 0.90.4.  Is WALPlayer available in this version?  I am having trouble
> finding it or anything similar.
>
> On Sat, Jun 30, 2012 at 1:14 PM, Li Pi <[EMAIL PROTECTED]> wrote:
>
> > WALPlayer will look at the timestamp. Replaying an older edit that has
> > since been overwritten shouldn't change anything.
> >
> > On Sat, Jun 30, 2012 at 9:49 AM, Bryan Beaudreault <
> > [EMAIL PROTECTED]
> > > wrote:
> >
> > > They are all pretty large, around 40+mb.  Will the walplayer be smart
> > > enough to only write edits that still look relevant (i.e. based on
> > > timestamps of the edits vs timestamps of the versions in hbase)?
> Writes
> > > have been coming in since we recovered.
> > >
> > > On Sat, Jun 30, 2012 at 11:05 AM, Stack <[EMAIL PROTECTED]> wrote:
> > >
> > > > On Sat, Jun 30, 2012 at 8:38 AM, Bryan Beaudreault
> > > > <[EMAIL PROTECTED]> wrote:
> > > > > 12/06/30 00:00:48 INFO wal.HLogSplitter: Got while parsing hlog
> > > > >
> > > >
> > >
> >
> hdfs://my-namenode-ip-addr:8020/hbase/.logs/my-rs-ip-addr,60020,1338667719591/my-rs-ip-addr%3A60020.1340935453874.
> > > > > Marking as corrupted
> > > > >
> > > >
> > > > What size do these logs have?
> > > >
> > > > > We are back to stable operating now, and in trying to research
> this I
> > > > found
> > > > > the hdfs://my-namenode-ip-addr:8020/hbase/.corrupt directory.
> There
> > > are
> > > > 20
> > > > > files listed there.
> > > > >
> > > >
> > > > Ditto.
> > > >
> > > > > What are our options for tracking down and potentially recovering
> any
> > > > data
> > > > > that was lost.  Or how can we even tell what was lost, if any?
> Does
> > > the
> > > > > existence of these files pretty much guarantee data lost? There
> > doesn't
> > > > > seem to be much documentation on this.  From reading it seems like
> it
> > > > might
> > > > > be possible that part of each of these files was recovered.
> > > > >
> > > >
> > > > If size > 0, could try walplaying them:
> > > > http://hbase.apache.org/book.html#walplayer
> > > >
> > > > St.Ack
> > > >
> > >
> >
>
>