|
Bryan Beaudreault
2012-06-30, 06:38
Stack
2012-06-30, 15:05
Bryan Beaudreault
2012-06-30, 16:49
Li Pi
2012-06-30, 17:14
Bryan Beaudreault
2012-06-30, 19:29
Li Pi
2012-06-30, 20:34
Jerry Lam
2012-07-01, 01:55
lars hofhansl
2012-07-01, 03:43
Bryan Beaudreault
2012-07-01, 05:07
|
-
Recovering corrupt HLog filesBryan Beaudreault 2012-06-30, 06:38
Hello all,
In an AWS outtage we lost about a 5th of our regionservers, and about an 8th of our total datanodes. Despite a replication factor of 3, it appears we may have lost some data from corrupt HLogs. Looking at my hmaster I see messages like this: 12/06/30 00:00:48 INFO wal.HLogSplitter: Got while parsing hlog hdfs://my-namenode-ip-addr:8020/hbase/.logs/my-rs-ip-addr,60020,1338667719591/my-rs-ip-addr%3A60020.1340935453874. Marking as corrupted We are back to stable operating now, and in trying to research this I found the hdfs://my-namenode-ip-addr:8020/hbase/.corrupt directory. There are 20 files listed there. What are our options for tracking down and potentially recovering any data that was lost. Or how can we even tell what was lost, if any? Does the existence of these files pretty much guarantee data lost? There doesn't seem to be much documentation on this. From reading it seems like it might be possible that part of each of these files was recovered. Any help would be appreciated. Thanks! Bryan
-
Re: Recovering corrupt HLog filesStack 2012-06-30, 15:05
On Sat, Jun 30, 2012 at 8:38 AM, Bryan Beaudreault
<[EMAIL PROTECTED]> wrote: > 12/06/30 00:00:48 INFO wal.HLogSplitter: Got while parsing hlog > hdfs://my-namenode-ip-addr:8020/hbase/.logs/my-rs-ip-addr,60020,1338667719591/my-rs-ip-addr%3A60020.1340935453874. > Marking as corrupted > What size do these logs have? > We are back to stable operating now, and in trying to research this I found > the hdfs://my-namenode-ip-addr:8020/hbase/.corrupt directory. There are 20 > files listed there. > Ditto. > What are our options for tracking down and potentially recovering any data > that was lost. Or how can we even tell what was lost, if any? Does the > existence of these files pretty much guarantee data lost? There doesn't > seem to be much documentation on this. From reading it seems like it might > be possible that part of each of these files was recovered. > If size > 0, could try walplaying them: http://hbase.apache.org/book.html#walplayer St.Ack
-
Re: Recovering corrupt HLog filesBryan Beaudreault 2012-06-30, 16:49
They are all pretty large, around 40+mb. Will the walplayer be smart
enough to only write edits that still look relevant (i.e. based on timestamps of the edits vs timestamps of the versions in hbase)? Writes have been coming in since we recovered. On Sat, Jun 30, 2012 at 11:05 AM, Stack <[EMAIL PROTECTED]> wrote: > On Sat, Jun 30, 2012 at 8:38 AM, Bryan Beaudreault > <[EMAIL PROTECTED]> wrote: > > 12/06/30 00:00:48 INFO wal.HLogSplitter: Got while parsing hlog > > > hdfs://my-namenode-ip-addr:8020/hbase/.logs/my-rs-ip-addr,60020,1338667719591/my-rs-ip-addr%3A60020.1340935453874. > > Marking as corrupted > > > > What size do these logs have? > > > We are back to stable operating now, and in trying to research this I > found > > the hdfs://my-namenode-ip-addr:8020/hbase/.corrupt directory. There are > 20 > > files listed there. > > > > Ditto. > > > What are our options for tracking down and potentially recovering any > data > > that was lost. Or how can we even tell what was lost, if any? Does the > > existence of these files pretty much guarantee data lost? There doesn't > > seem to be much documentation on this. From reading it seems like it > might > > be possible that part of each of these files was recovered. > > > > If size > 0, could try walplaying them: > http://hbase.apache.org/book.html#walplayer > > St.Ack >
-
Re: Recovering corrupt HLog filesLi Pi 2012-06-30, 17:14
WALPlayer will look at the timestamp. Replaying an older edit that has
since been overwritten shouldn't change anything. On Sat, Jun 30, 2012 at 9:49 AM, Bryan Beaudreault <[EMAIL PROTECTED] > wrote: > They are all pretty large, around 40+mb. Will the walplayer be smart > enough to only write edits that still look relevant (i.e. based on > timestamps of the edits vs timestamps of the versions in hbase)? Writes > have been coming in since we recovered. > > On Sat, Jun 30, 2012 at 11:05 AM, Stack <[EMAIL PROTECTED]> wrote: > > > On Sat, Jun 30, 2012 at 8:38 AM, Bryan Beaudreault > > <[EMAIL PROTECTED]> wrote: > > > 12/06/30 00:00:48 INFO wal.HLogSplitter: Got while parsing hlog > > > > > > hdfs://my-namenode-ip-addr:8020/hbase/.logs/my-rs-ip-addr,60020,1338667719591/my-rs-ip-addr%3A60020.1340935453874. > > > Marking as corrupted > > > > > > > What size do these logs have? > > > > > We are back to stable operating now, and in trying to research this I > > found > > > the hdfs://my-namenode-ip-addr:8020/hbase/.corrupt directory. There > are > > 20 > > > files listed there. > > > > > > > Ditto. > > > > > What are our options for tracking down and potentially recovering any > > data > > > that was lost. Or how can we even tell what was lost, if any? Does > the > > > existence of these files pretty much guarantee data lost? There doesn't > > > seem to be much documentation on this. From reading it seems like it > > might > > > be possible that part of each of these files was recovered. > > > > > > > If size > 0, could try walplaying them: > > http://hbase.apache.org/book.html#walplayer > > > > St.Ack > > >
-
Re: Recovering corrupt HLog filesBryan Beaudreault 2012-06-30, 19:29
I should have mentioned in my initial email that I am operating on HBase
0.90.4. Is WALPlayer available in this version? I am having trouble finding it or anything similar. On Sat, Jun 30, 2012 at 1:14 PM, Li Pi <[EMAIL PROTECTED]> wrote: > WALPlayer will look at the timestamp. Replaying an older edit that has > since been overwritten shouldn't change anything. > > On Sat, Jun 30, 2012 at 9:49 AM, Bryan Beaudreault < > [EMAIL PROTECTED] > > wrote: > > > They are all pretty large, around 40+mb. Will the walplayer be smart > > enough to only write edits that still look relevant (i.e. based on > > timestamps of the edits vs timestamps of the versions in hbase)? Writes > > have been coming in since we recovered. > > > > On Sat, Jun 30, 2012 at 11:05 AM, Stack <[EMAIL PROTECTED]> wrote: > > > > > On Sat, Jun 30, 2012 at 8:38 AM, Bryan Beaudreault > > > <[EMAIL PROTECTED]> wrote: > > > > 12/06/30 00:00:48 INFO wal.HLogSplitter: Got while parsing hlog > > > > > > > > > > hdfs://my-namenode-ip-addr:8020/hbase/.logs/my-rs-ip-addr,60020,1338667719591/my-rs-ip-addr%3A60020.1340935453874. > > > > Marking as corrupted > > > > > > > > > > What size do these logs have? > > > > > > > We are back to stable operating now, and in trying to research this I > > > found > > > > the hdfs://my-namenode-ip-addr:8020/hbase/.corrupt directory. There > > are > > > 20 > > > > files listed there. > > > > > > > > > > Ditto. > > > > > > > What are our options for tracking down and potentially recovering any > > > data > > > > that was lost. Or how can we even tell what was lost, if any? Does > > the > > > > existence of these files pretty much guarantee data lost? There > doesn't > > > > seem to be much documentation on this. From reading it seems like it > > > might > > > > be possible that part of each of these files was recovered. > > > > > > > > > > If size > 0, could try walplaying them: > > > http://hbase.apache.org/book.html#walplayer > > > > > > St.Ack > > > > > >
-
Re: Recovering corrupt HLog filesLi Pi 2012-06-30, 20:34
Nope. It came out in 0.94 otoh.
On Sat, Jun 30, 2012 at 12:29 PM, Bryan Beaudreault < [EMAIL PROTECTED]> wrote: > I should have mentioned in my initial email that I am operating on HBase > 0.90.4. Is WALPlayer available in this version? I am having trouble > finding it or anything similar. > > On Sat, Jun 30, 2012 at 1:14 PM, Li Pi <[EMAIL PROTECTED]> wrote: > > > WALPlayer will look at the timestamp. Replaying an older edit that has > > since been overwritten shouldn't change anything. > > > > On Sat, Jun 30, 2012 at 9:49 AM, Bryan Beaudreault < > > [EMAIL PROTECTED] > > > wrote: > > > > > They are all pretty large, around 40+mb. Will the walplayer be smart > > > enough to only write edits that still look relevant (i.e. based on > > > timestamps of the edits vs timestamps of the versions in hbase)? > Writes > > > have been coming in since we recovered. > > > > > > On Sat, Jun 30, 2012 at 11:05 AM, Stack <[EMAIL PROTECTED]> wrote: > > > > > > > On Sat, Jun 30, 2012 at 8:38 AM, Bryan Beaudreault > > > > <[EMAIL PROTECTED]> wrote: > > > > > 12/06/30 00:00:48 INFO wal.HLogSplitter: Got while parsing hlog > > > > > > > > > > > > > > > hdfs://my-namenode-ip-addr:8020/hbase/.logs/my-rs-ip-addr,60020,1338667719591/my-rs-ip-addr%3A60020.1340935453874. > > > > > Marking as corrupted > > > > > > > > > > > > > What size do these logs have? > > > > > > > > > We are back to stable operating now, and in trying to research > this I > > > > found > > > > > the hdfs://my-namenode-ip-addr:8020/hbase/.corrupt directory. > There > > > are > > > > 20 > > > > > files listed there. > > > > > > > > > > > > > Ditto. > > > > > > > > > What are our options for tracking down and potentially recovering > any > > > > data > > > > > that was lost. Or how can we even tell what was lost, if any? > Does > > > the > > > > > existence of these files pretty much guarantee data lost? There > > doesn't > > > > > seem to be much documentation on this. From reading it seems like > it > > > > might > > > > > be possible that part of each of these files was recovered. > > > > > > > > > > > > > If size > 0, could try walplaying them: > > > > http://hbase.apache.org/book.html#walplayer > > > > > > > > St.Ack > > > > > > > > > >
-
Re: Recovering corrupt HLog filesJerry Lam 2012-07-01, 01:55
This is interesting because I saw this happens in the past. Is walplayer can be back ported to 0.90.x?
Best Regards, Jerry Sent from my iPad On 2012-06-30, at 16:34, Li Pi <[EMAIL PROTECTED]> wrote: > Nope. It came out in 0.94 otoh. > > On Sat, Jun 30, 2012 at 12:29 PM, Bryan Beaudreault < > [EMAIL PROTECTED]> wrote: > >> I should have mentioned in my initial email that I am operating on HBase >> 0.90.4. Is WALPlayer available in this version? I am having trouble >> finding it or anything similar. >> >> On Sat, Jun 30, 2012 at 1:14 PM, Li Pi <[EMAIL PROTECTED]> wrote: >> >>> WALPlayer will look at the timestamp. Replaying an older edit that has >>> since been overwritten shouldn't change anything. >>> >>> On Sat, Jun 30, 2012 at 9:49 AM, Bryan Beaudreault < >>> [EMAIL PROTECTED] >>>> wrote: >>> >>>> They are all pretty large, around 40+mb. Will the walplayer be smart >>>> enough to only write edits that still look relevant (i.e. based on >>>> timestamps of the edits vs timestamps of the versions in hbase)? >> Writes >>>> have been coming in since we recovered. >>>> >>>> On Sat, Jun 30, 2012 at 11:05 AM, Stack <[EMAIL PROTECTED]> wrote: >>>> >>>>> On Sat, Jun 30, 2012 at 8:38 AM, Bryan Beaudreault >>>>> <[EMAIL PROTECTED]> wrote: >>>>>> 12/06/30 00:00:48 INFO wal.HLogSplitter: Got while parsing hlog >>>>>> >>>>> >>>> >>> >> hdfs://my-namenode-ip-addr:8020/hbase/.logs/my-rs-ip-addr,60020,1338667719591/my-rs-ip-addr%3A60020.1340935453874. >>>>>> Marking as corrupted >>>>>> >>>>> >>>>> What size do these logs have? >>>>> >>>>>> We are back to stable operating now, and in trying to research >> this I >>>>> found >>>>>> the hdfs://my-namenode-ip-addr:8020/hbase/.corrupt directory. >> There >>>> are >>>>> 20 >>>>>> files listed there. >>>>>> >>>>> >>>>> Ditto. >>>>> >>>>>> What are our options for tracking down and potentially recovering >> any >>>>> data >>>>>> that was lost. Or how can we even tell what was lost, if any? >> Does >>>> the >>>>>> existence of these files pretty much guarantee data lost? There >>> doesn't >>>>>> seem to be much documentation on this. From reading it seems like >> it >>>>> might >>>>>> be possible that part of each of these files was recovered. >>>>>> >>>>> >>>>> If size > 0, could try walplaying them: >>>>> http://hbase.apache.org/book.html#walplayer >>>>> >>>>> St.Ack >>>>> >>>> >>> >>
-
Re: Recovering corrupt HLog fileslars hofhansl 2012-07-01, 03:43
I added it in HBase 0.94, but the code is pretty isolated and could be easily (I think) ported to HBase 0.90.
Could potentially be just turned into a separate jar file. As for whether it'll do the write thing... It uses the timestamps provided solely to pick the right set of HLog instead of playing them all. But since all operations (except Increment/Append) are idempotent with timestamp, playing them again has no effect (i.e. your newer versions will still be visible, since they have a newer timestamp). As these files are corrupt there's no way way of knowing how far WALPlayer will get in writing them. It will at least play until the first corruption identified. -- Lars ----- Original Message ----- From: Bryan Beaudreault <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: Sent: Saturday, June 30, 2012 12:29 PM Subject: Re: Recovering corrupt HLog files I should have mentioned in my initial email that I am operating on HBase 0.90.4. Is WALPlayer available in this version? I am having trouble finding it or anything similar. On Sat, Jun 30, 2012 at 1:14 PM, Li Pi <[EMAIL PROTECTED]> wrote: > WALPlayer will look at the timestamp. Replaying an older edit that has > since been overwritten shouldn't change anything. > > On Sat, Jun 30, 2012 at 9:49 AM, Bryan Beaudreault < > [EMAIL PROTECTED] > > wrote: > > > They are all pretty large, around 40+mb. Will the walplayer be smart > > enough to only write edits that still look relevant (i.e. based on > > timestamps of the edits vs timestamps of the versions in hbase)? Writes > > have been coming in since we recovered. > > > > On Sat, Jun 30, 2012 at 11:05 AM, Stack <[EMAIL PROTECTED]> wrote: > > > > > On Sat, Jun 30, 2012 at 8:38 AM, Bryan Beaudreault > > > <[EMAIL PROTECTED]> wrote: > > > > 12/06/30 00:00:48 INFO wal.HLogSplitter: Got while parsing hlog > > > > > > > > > > hdfs://my-namenode-ip-addr:8020/hbase/.logs/my-rs-ip-addr,60020,1338667719591/my-rs-ip-addr%3A60020.1340935453874. > > > > Marking as corrupted > > > > > > > > > > What size do these logs have? > > > > > > > We are back to stable operating now, and in trying to research this I > > > found > > > > the hdfs://my-namenode-ip-addr:8020/hbase/.corrupt directory. There > > are > > > 20 > > > > files listed there. > > > > > > > > > > Ditto. > > > > > > > What are our options for tracking down and potentially recovering any > > > data > > > > that was lost. Or how can we even tell what was lost, if any? Does > > the > > > > existence of these files pretty much guarantee data lost? There > doesn't > > > > seem to be much documentation on this. From reading it seems like it > > > might > > > > be possible that part of each of these files was recovered. > > > > > > > > > > If size > 0, could try walplaying them: > > > http://hbase.apache.org/book.html#walplayer > > > > > > St.Ack > > > > > >
-
Re: Recovering corrupt HLog filesBryan Beaudreault 2012-07-01, 05:07
Thanks all for the additional input. I do not think the HLogs are
corrupted any longer, at least I think it was because we had also lost a good portion of data nodes. We have since recovered all the datanodes, so they are good. We will look in to creating an executable jar out of your WALPlayer class if it comes to that. Right now we are happy that the two systems powered by this were engineered accordingly: idempotent, fail-fast hadoop jobs for one, and using bookeeper WAL for the other. On Sat, Jun 30, 2012 at 11:43 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > I added it in HBase 0.94, but the code is pretty isolated and could be > easily (I think) ported to HBase 0.90. > Could potentially be just turned into a separate jar file. > > > As for whether it'll do the write thing... It uses the timestamps provided > solely to pick the right set of HLog instead of playing them all. > But since all operations (except Increment/Append) are idempotent with > timestamp, playing them again has no effect (i.e. your newer versions will > still be visible, since they have a newer timestamp). > > As these files are corrupt there's no way way of knowing how far WALPlayer > will get in writing them. It will at least play until the first corruption > identified. > > > -- Lars > > > > ----- Original Message ----- > From: Bryan Beaudreault <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: > Sent: Saturday, June 30, 2012 12:29 PM > Subject: Re: Recovering corrupt HLog files > > I should have mentioned in my initial email that I am operating on HBase > 0.90.4. Is WALPlayer available in this version? I am having trouble > finding it or anything similar. > > On Sat, Jun 30, 2012 at 1:14 PM, Li Pi <[EMAIL PROTECTED]> wrote: > > > WALPlayer will look at the timestamp. Replaying an older edit that has > > since been overwritten shouldn't change anything. > > > > On Sat, Jun 30, 2012 at 9:49 AM, Bryan Beaudreault < > > [EMAIL PROTECTED] > > > wrote: > > > > > They are all pretty large, around 40+mb. Will the walplayer be smart > > > enough to only write edits that still look relevant (i.e. based on > > > timestamps of the edits vs timestamps of the versions in hbase)? > Writes > > > have been coming in since we recovered. > > > > > > On Sat, Jun 30, 2012 at 11:05 AM, Stack <[EMAIL PROTECTED]> wrote: > > > > > > > On Sat, Jun 30, 2012 at 8:38 AM, Bryan Beaudreault > > > > <[EMAIL PROTECTED]> wrote: > > > > > 12/06/30 00:00:48 INFO wal.HLogSplitter: Got while parsing hlog > > > > > > > > > > > > > > > hdfs://my-namenode-ip-addr:8020/hbase/.logs/my-rs-ip-addr,60020,1338667719591/my-rs-ip-addr%3A60020.1340935453874. > > > > > Marking as corrupted > > > > > > > > > > > > > What size do these logs have? > > > > > > > > > We are back to stable operating now, and in trying to research > this I > > > > found > > > > > the hdfs://my-namenode-ip-addr:8020/hbase/.corrupt directory. > There > > > are > > > > 20 > > > > > files listed there. > > > > > > > > > > > > > Ditto. > > > > > > > > > What are our options for tracking down and potentially recovering > any > > > > data > > > > > that was lost. Or how can we even tell what was lost, if any? > Does > > > the > > > > > existence of these files pretty much guarantee data lost? There > > doesn't > > > > > seem to be much documentation on this. From reading it seems like > it > > > > might > > > > > be possible that part of each of these files was recovered. > > > > > > > > > > > > > If size > 0, could try walplaying them: > > > > http://hbase.apache.org/book.html#walplayer > > > > > > > > St.Ack > > > > > > > > > > > |