|
Vladimir Rodionov
2011-11-02, 20:24
Todd Lipcon
2011-11-02, 20:27
Vladimir Rodionov
2011-11-02, 20:31
Stack
2011-11-02, 21:56
Vladimir Rodionov
2011-11-02, 22:15
Jean-Daniel Cryans
2011-11-03, 17:34
Vladimir Rodionov
2011-11-03, 18:56
Stack
2011-11-03, 19:00
Vladimir Rodionov
2011-11-03, 20:03
Vladimir Rodionov
2011-11-04, 21:37
Vladimir Rodionov
2011-11-04, 21:47
Stack
2011-11-04, 22:13
Cosmin Lehene
2011-11-07, 17:07
|
-
HBase 0.90.4 missing data in productionVladimir Rodionov 2011-11-02, 20:24
Even with WAL enabled
HBase version: 0.90.4 Hadoop: 0.20.2+320 (CDH3) When M/R job finishes loading data into HBase table and right after that everything is OK, no missing data. We can confirm that by running our own internal tool. But when we re-start the HBase cluster and run the checker again we regularly find missing rows in a table. The total number of rows decreases in a table. This "missing rows" amount is small = ~ 0.1-0.3% of a total. After first restart all subsequent restart do not affect on a total number of rows. It seems, that we loose only during first restart Table's TTL = 1 year. There is a slim chance that we load data with timestamps more than one year behind, but it does not explain the difference between total number of rows before and after cluster's restart. All RS are time synched. In Master log I do not see any WARN or ERRORs during cluster re-start. In RS logs I see a lot of: 2011-11-02 00:16:07,620 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/us01-ciqps1-grid01.carrieriq.com,60020,1320187507171/us01-ciqps1-grid01.carrieriq.com%3A60020.1320192949451, entries=76, filesize=68053806. New hlog /hbase/.logs/us01-ciqps1-grid01.carrieriq.com,60020,1320187507171/us01-ciqps1-grid01.carrieriq.com%3A60020.1320192967380 2011-11-02 00:16:07,621 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor 2011-11-02 00:16:07,621 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog java.io.IOException: Reflection at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1002) at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:955) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:1483) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1392) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2591) at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:145) ... 10 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeThreads(DFSClient.java:3306) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3216) at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) at org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944) ... 14 more This is probably not all ERRORS and FATALs I am continuing investigation and will post my other findings later. Can someone tell me what should I check else? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: [EMAIL PROTECTED] ________________________________________ From: Todd Lipcon [[EMAIL PROTECTED]] Sent: Wednesday, November 02, 2011 12:53 PM To: [EMAIL PROTECTED] Subject: Re: HBASE-4120 table level priority It's up to each committer to decide how they want to allocate their time, of course, but I'd like to encourage folks to hold off on spending a lot of time on new 0.94 feature merges until we've gotten 0.92.0 stabilized and out the door. Big features like QoS need a fair amount of review, both on the general approach and on the specific implementation, before they can get merged. It's hard to find time for a thorough review when we're in the finishing stages of stabilizing a release. -Todd On Wed, Nov 2, 2011 at 12:23 PM, Ted Yu <[EMAIL PROTECTED]> wrote: Todd Lipcon Software Engineer, Cloudera Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments.
-
Re: HBase 0.90.4 missing data in productionTodd Lipcon 2011-11-02, 20:27
Are you restarting your HDFS cluster underneath your running HBase
cluster? Since you're running a rather old CDH3 (+320 is CDH3b2 if I remember correctly, something like 18mo old) you have a couple missing bug fixes that would cause data loss in the scenario that you restart HDFS under HBase. On Wed, Nov 2, 2011 at 1:24 PM, Vladimir Rodionov <[EMAIL PROTECTED]> wrote: > Even with WAL enabled > HBase version: 0.90.4 > Hadoop: 0.20.2+320 (CDH3) > > > When M/R job finishes loading data into HBase table and right after that everything is OK, no missing data. We can confirm that by running our own internal tool. > But when we re-start the HBase cluster and run the checker again we regularly find missing rows in a table. The total number of rows decreases in a table. > This "missing rows" amount is small = ~ 0.1-0.3% of a total. After first restart all subsequent restart do not affect on a total number of rows. It seems, > that we loose only during first restart > > Table's TTL = 1 year. There is a slim chance that we load data with timestamps more than one year behind, but it does not explain the difference > between total number of rows before and after cluster's restart. > > All RS are time synched. > > In Master log I do not see any WARN or ERRORs during cluster re-start. In RS logs I see a lot of: > > 2011-11-02 00:16:07,620 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/us01-ciqps1-grid01.carrieriq.com,60020,1320187507171/us01-ciqps1-grid01.carrieriq.com%3A60020.1320192949451, entries=76, filesize=68053806. New hlog /hbase/.logs/us01-ciqps1-grid01.carrieriq.com,60020,1320187507171/us01-ciqps1-grid01.carrieriq.com%3A60020.1320192967380 > 2011-11-02 00:16:07,621 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor > 2011-11-02 00:16:07,621 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog > java.io.IOException: Reflection > at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147) > at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1002) > at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:955) > at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:1483) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1392) > at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2591) > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) > at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:145) > ... 10 more > Caused by: java.lang.NullPointerException > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeThreads(DFSClient.java:3306) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3216) > at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) > at org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944) > ... 14 more > > This is probably not all ERRORS and FATALs I am continuing investigation and will post my other findings later. > > Can someone tell me what should I check else? > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: [EMAIL PROTECTED] Todd Lipcon Software Engineer, Cloudera
-
RE: HBase 0.90.4 missing data in productionVladimir Rodionov 2011-11-02, 20:31
No, I restart only HBase cluster
It seems that the FATAL error is the only one I can find in RS log files. No ERRORs Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: [EMAIL PROTECTED] ________________________________________ From: Todd Lipcon [[EMAIL PROTECTED]] Sent: Wednesday, November 02, 2011 1:27 PM To: [EMAIL PROTECTED] Subject: Re: HBase 0.90.4 missing data in production Are you restarting your HDFS cluster underneath your running HBase cluster? Since you're running a rather old CDH3 (+320 is CDH3b2 if I remember correctly, something like 18mo old) you have a couple missing bug fixes that would cause data loss in the scenario that you restart HDFS under HBase. On Wed, Nov 2, 2011 at 1:24 PM, Vladimir Rodionov <[EMAIL PROTECTED]> wrote: > Even with WAL enabled > HBase version: 0.90.4 > Hadoop: 0.20.2+320 (CDH3) > > > When M/R job finishes loading data into HBase table and right after that everything is OK, no missing data. We can confirm that by running our own internal tool. > But when we re-start the HBase cluster and run the checker again we regularly find missing rows in a table. The total number of rows decreases in a table. > This "missing rows" amount is small = ~ 0.1-0.3% of a total. After first restart all subsequent restart do not affect on a total number of rows. It seems, > that we loose only during first restart > > Table's TTL = 1 year. There is a slim chance that we load data with timestamps more than one year behind, but it does not explain the difference > between total number of rows before and after cluster's restart. > > All RS are time synched. > > In Master log I do not see any WARN or ERRORs during cluster re-start. In RS logs I see a lot of: > > 2011-11-02 00:16:07,620 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/us01-ciqps1-grid01.carrieriq.com,60020,1320187507171/us01-ciqps1-grid01.carrieriq.com%3A60020.1320192949451, entries=76, filesize=68053806. New hlog /hbase/.logs/us01-ciqps1-grid01.carrieriq.com,60020,1320187507171/us01-ciqps1-grid01.carrieriq.com%3A60020.1320192967380 > 2011-11-02 00:16:07,621 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor > 2011-11-02 00:16:07,621 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog > java.io.IOException: Reflection > at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147) > at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1002) > at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:955) > at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:1483) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1392) > at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2591) > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) > at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:145) > ... 10 more > Caused by: java.lang.NullPointerException > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeThreads(DFSClient.java:3306) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3216) > at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) Todd Lipcon Software Engineer, Cloudera Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments.
-
Re: HBase 0.90.4 missing data in productionStack 2011-11-02, 21:56
On Wed, Nov 2, 2011 at 1:24 PM, Vladimir Rodionov
<[EMAIL PROTECTED]> wrote: > We can confirm that by running our own internal tool. Whats this tool doing Vladimir? Is it running against HBase API? > It seems, that we loose only during first restart > How are you doing the restart? You killing regionservers? On restart, are we splitting wal logs or is it a 'clean' restart where no wals are split? Can you figure what the missing data is? Is it all from same one or two regions? > Table's TTL = 1 year. There is a slim chance that we load data with timestamps more than one year behind, but it does not explain the difference > between total number of rows before and after cluster's restart. > I'd think that if your 'internal tool' found the stuff before the restart, then this is probably ok (a year could have elapsed over the restart I suppose but that be odd...) > All RS are time synched. > > In Master log I do not see any WARN or ERRORs during cluster re-start. In RS logs I see a lot of: > > 2011-11-02 00:16:07,620 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/us01-ciqps1-grid01.carrieriq.com,60020,1320187507171/us01-ciqps1-grid01.carrieriq.com%3A60020.1320192949451, entries=76, filesize=68053806. New hlog /hbase/.logs/us01-ciqps1-grid01.carrieriq.com,60020,1320187507171/us01-ciqps1-grid01.carrieriq.com%3A60020.1320192967380 > 2011-11-02 00:16:07,621 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor > 2011-11-02 00:16:07,621 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog > java.io.IOException: Reflection > at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147) > at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1002) > at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:955) > at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:1483) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1392) > at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2591) > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) > at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:145) > ... 10 more > Caused by: java.lang.NullPointerException > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeThreads(DFSClient.java:3306) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3216) > at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) > at org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944) > ... 14 more > Perhaps a newer CDH would have fix for this? > This is probably not all ERRORS and FATALs I am continuing investigation and will post my other findings later. > Let us know. St.Ack
-
RE: HBase 0.90.4 missing data in productionVladimir Rodionov 2011-11-02, 22:15
1. The Tool opens cursor to the database (RDBMS) and reads 'rowid's . We keep ROWIDs in RDBMS
2. Then using first N (configurable) rowids it fetches these rows from HBase using HBase bulk get API. HBase restart : stop-hbase start-hbase on Master node I do not kill RS - its dangerous :) Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: [EMAIL PROTECTED] ________________________________________ From: [EMAIL PROTECTED] [[EMAIL PROTECTED]] On Behalf Of Stack [[EMAIL PROTECTED]] Sent: Wednesday, November 02, 2011 2:56 PM To: [EMAIL PROTECTED] Subject: Re: HBase 0.90.4 missing data in production On Wed, Nov 2, 2011 at 1:24 PM, Vladimir Rodionov <[EMAIL PROTECTED]> wrote: > We can confirm that by running our own internal tool. Whats this tool doing Vladimir? Is it running against HBase API? > It seems, that we loose only during first restart > How are you doing the restart? You killing regionservers? On restart, are we splitting wal logs or is it a 'clean' restart where no wals are split? Can you figure what the missing data is? Is it all from same one or two regions? > Table's TTL = 1 year. There is a slim chance that we load data with timestamps more than one year behind, but it does not explain the difference > between total number of rows before and after cluster's restart. > I'd think that if your 'internal tool' found the stuff before the restart, then this is probably ok (a year could have elapsed over the restart I suppose but that be odd...) > All RS are time synched. > > In Master log I do not see any WARN or ERRORs during cluster re-start. In RS logs I see a lot of: > > 2011-11-02 00:16:07,620 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/us01-ciqps1-grid01.carrieriq.com,60020,1320187507171/us01-ciqps1-grid01.carrieriq.com%3A60020.1320192949451, entries=76, filesize=68053806. New hlog /hbase/.logs/us01-ciqps1-grid01.carrieriq.com,60020,1320187507171/us01-ciqps1-grid01.carrieriq.com%3A60020.1320192967380 > 2011-11-02 00:16:07,621 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor > 2011-11-02 00:16:07,621 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog > java.io.IOException: Reflection > at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147) > at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1002) > at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:955) > at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:1483) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1392) > at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2591) > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) > at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:145) > ... 10 more > Caused by: java.lang.NullPointerException > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeThreads(DFSClient.java:3306) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3216) > at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) > at org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944) Perhaps a newer CDH would have fix for this? Let us know. St.Ack Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments.
-
Re: HBase 0.90.4 missing data in productionJean-Daniel Cryans 2011-11-03, 17:34
Important questions that Stack asked that remain unanswered:
> On restart, are we splitting wal logs or is it a 'clean' restart where no > wals are split? > Can you figure what the missing data is? Is it all from same one or > two regions? Basically if there is indeed missing data, we need to figure where it's coming from. Something helpful could also be a bigger log snippet, the stack traces you pasted does show something that "looks" bad but IIRC that NPE comes out only when the file was already closed by something else so it could have happened while the region server was shutting... or not, I can't tell without the context. Thx, J-D On Wed, Nov 2, 2011 at 10:15 PM, Vladimir Rodionov <[EMAIL PROTECTED]> wrote: > 1. The Tool opens cursor to the database (RDBMS) and reads 'rowid's . We keep ROWIDs in RDBMS > 2. Then using first N (configurable) rowids it fetches these rows from HBase using HBase bulk get API. > > HBase restart : > > stop-hbase > start-hbase > > on Master node > > I do not kill RS - its dangerous :) > > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: [EMAIL PROTECTED] > > ________________________________________ > From: [EMAIL PROTECTED] [[EMAIL PROTECTED]] On Behalf Of Stack [[EMAIL PROTECTED]] > Sent: Wednesday, November 02, 2011 2:56 PM > To: [EMAIL PROTECTED] > Subject: Re: HBase 0.90.4 missing data in production > > On Wed, Nov 2, 2011 at 1:24 PM, Vladimir Rodionov > <[EMAIL PROTECTED]> wrote: >> We can confirm that by running our own internal tool. > > Whats this tool doing Vladimir? Is it running against HBase API? > >> It seems, that we loose only during first restart >> > > How are you doing the restart? You killing regionservers? On > restart, are we splitting wal logs or is it a 'clean' restart where no > wals are split? > > Can you figure what the missing data is? Is it all from same one or > two regions? > > >> Table's TTL = 1 year. There is a slim chance that we load data with timestamps more than one year behind, but it does not explain the difference >> between total number of rows before and after cluster's restart. >> > > I'd think that if your 'internal tool' found the stuff before the > restart, then this is probably ok (a year could have elapsed over the > restart I suppose but that be odd...) > > >> All RS are time synched. >> >> In Master log I do not see any WARN or ERRORs during cluster re-start. In RS logs I see a lot of: >> >> 2011-11-02 00:16:07,620 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/us01-ciqps1-grid01.carrieriq.com,60020,1320187507171/us01-ciqps1-grid01.carrieriq.com%3A60020.1320192949451, entries=76, filesize=68053806. New hlog /hbase/.logs/us01-ciqps1-grid01.carrieriq.com,60020,1320187507171/us01-ciqps1-grid01.carrieriq.com%3A60020.1320192967380 >> 2011-11-02 00:16:07,621 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor >> 2011-11-02 00:16:07,621 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog >> java.io.IOException: Reflection >> at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147) >> at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1002) >> at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:955) >> at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:1483) >> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1392) >> at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2591) >> at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) >> at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
-
RE: HBase 0.90.4 missing data in productionVladimir Rodionov 2011-11-03, 18:56
I can not reproduce the bug today. No missing rows after cluster restart. I will answer
Stack's question as soon as I hit this problem again. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: [EMAIL PROTECTED] ________________________________________ From: [EMAIL PROTECTED] [[EMAIL PROTECTED]] On Behalf Of Jean-Daniel Cryans [[EMAIL PROTECTED]] Sent: Thursday, November 03, 2011 10:34 AM To: [EMAIL PROTECTED] Subject: Re: HBase 0.90.4 missing data in production Important questions that Stack asked that remain unanswered: > On restart, are we splitting wal logs or is it a 'clean' restart where no > wals are split? > Can you figure what the missing data is? Is it all from same one or > two regions? Basically if there is indeed missing data, we need to figure where it's coming from. Something helpful could also be a bigger log snippet, the stack traces you pasted does show something that "looks" bad but IIRC that NPE comes out only when the file was already closed by something else so it could have happened while the region server was shutting... or not, I can't tell without the context. Thx, J-D On Wed, Nov 2, 2011 at 10:15 PM, Vladimir Rodionov <[EMAIL PROTECTED]> wrote: > 1. The Tool opens cursor to the database (RDBMS) and reads 'rowid's . We keep ROWIDs in RDBMS > 2. Then using first N (configurable) rowids it fetches these rows from HBase using HBase bulk get API. > > HBase restart : > > stop-hbase > start-hbase > > on Master node > > I do not kill RS - its dangerous :) > > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: [EMAIL PROTECTED] > > ________________________________________ > From: [EMAIL PROTECTED] [[EMAIL PROTECTED]] On Behalf Of Stack [[EMAIL PROTECTED]] > Sent: Wednesday, November 02, 2011 2:56 PM > To: [EMAIL PROTECTED] > Subject: Re: HBase 0.90.4 missing data in production > > On Wed, Nov 2, 2011 at 1:24 PM, Vladimir Rodionov > <[EMAIL PROTECTED]> wrote: >> We can confirm that by running our own internal tool. > > Whats this tool doing Vladimir? Is it running against HBase API? > >> It seems, that we loose only during first restart >> > > How are you doing the restart? You killing regionservers? On > restart, are we splitting wal logs or is it a 'clean' restart where no > wals are split? > > Can you figure what the missing data is? Is it all from same one or > two regions? > > >> Table's TTL = 1 year. There is a slim chance that we load data with timestamps more than one year behind, but it does not explain the difference >> between total number of rows before and after cluster's restart. >> > > I'd think that if your 'internal tool' found the stuff before the > restart, then this is probably ok (a year could have elapsed over the > restart I suppose but that be odd...) > > >> All RS are time synched. >> >> In Master log I do not see any WARN or ERRORs during cluster re-start. In RS logs I see a lot of: >> >> 2011-11-02 00:16:07,620 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/us01-ciqps1-grid01.carrieriq.com,60020,1320187507171/us01-ciqps1-grid01.carrieriq.com%3A60020.1320192949451, entries=76, filesize=68053806. New hlog /hbase/.logs/us01-ciqps1-grid01.carrieriq.com,60020,1320187507171/us01-ciqps1-grid01.carrieriq.com%3A60020.1320192967380 >> 2011-11-02 00:16:07,621 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor >> 2011-11-02 00:16:07,621 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog >> java.io.IOException: Reflection >> at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147) >> at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1002) >> at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:955) >> at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:1483) Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments.
-
Re: HBase 0.90.4 missing data in productionStack 2011-11-03, 19:00
On Thu, Nov 3, 2011 at 11:56 AM, Vladimir Rodionov
<[EMAIL PROTECTED]> wrote: > I can not reproduce the bug today. No missing rows after cluster restart. I will answer > Stack's question as soon as I hit this problem again. > Can you compare logs from the bad and good runs Vladimir? St.Ack
-
RE: HBase 0.90.4 missing data in productionVladimir Rodionov 2011-11-03, 20:03
I would say that they are identical
This is Nov-2 RegionServer1: 2011-11-02 00:12:29,352 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:12:35,220 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:12:37,639 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:12:42,865 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:12:49,710 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:12:54,079 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:12:55,692 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:13:01,779 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:15:03,044 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:15:05,686 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:16:07,621 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:16:19,499 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:17:23,085 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:17:29,428 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:18:23,436 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:18:25,183 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:19:02,230 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:19:04,624 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:19:06,577 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:20:22,306 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:20:25,645 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:20:27,662 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:21:31,655 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:22:13,565 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:22:15,892 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:05:11,911 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:10:00,359 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:10:02,612 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:10:31,625 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:21:06,955 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:23:33,967 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:23:37,270 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:24:45,313 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:24:48,087 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:26:49,583 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:27:52,948 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:28:54,638 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:29:20,036 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:03:02,271 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:03:04,420 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:04:20,810 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:05:48,922 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:06:35,397 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:07:07,572 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:07:27,270 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:12:04,725 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:12:28,737 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:13:19,586 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:14:12,752 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:15:27,228 FATAL org.apache.hadoop.hbase
-
RE: HBase 0.90.4 missing data in productionVladimir Rodionov 2011-11-04, 21:37
I was able to reproduce the issue today again. Out of first 10K rows 8 were missing after cluster re-start. They are from 4 different regions.
Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: [EMAIL PROTECTED] ________________________________________ From: Vladimir Rodionov Sent: Thursday, November 03, 2011 1:03 PM To: [EMAIL PROTECTED] Subject: RE: HBase 0.90.4 missing data in production I would say that they are identical This is Nov-2 RegionServer1: 2011-11-02 00:12:29,352 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:12:35,220 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:12:37,639 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:12:42,865 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:12:49,710 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:12:54,079 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:12:55,692 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:13:01,779 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:15:03,044 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:15:05,686 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:16:07,621 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:16:19,499 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:17:23,085 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:17:29,428 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:18:23,436 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:18:25,183 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:19:02,230 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:19:04,624 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:19:06,577 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:20:22,306 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:20:25,645 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:20:27,662 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:21:31,655 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:22:13,565 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:22:15,892 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:05:11,911 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:10:00,359 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:10:02,612 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:10:31,625 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:21:06,955 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:23:33,967 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:23:37,270 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:24:45,313 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:24:48,087 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:26:49,583 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:27:52,948 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:28:54,638 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:29:20,036 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:03:02,271 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:03:04,420 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:04:20,810 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:05:48,922 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:06:35,397 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:07:07,572 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:07:27,270 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:12:04,725 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Cou
-
RE: HBase 0.90.4 missing data in productionVladimir Rodionov 2011-11-04, 21:47
There are no ERRORs and FATALs in Region Servers logs except one FATAL I have already posted. HLog sync
I am asking expert advise: what else should I check? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: [EMAIL PROTECTED] ________________________________________ From: Vladimir Rodionov Sent: Friday, November 04, 2011 2:37 PM To: [EMAIL PROTECTED] Subject: RE: HBase 0.90.4 missing data in production I was able to reproduce the issue today again. Out of first 10K rows 8 were missing after cluster re-start. They are from 4 different regions. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: [EMAIL PROTECTED] ________________________________________ From: Vladimir Rodionov Sent: Thursday, November 03, 2011 1:03 PM To: [EMAIL PROTECTED] Subject: RE: HBase 0.90.4 missing data in production I would say that they are identical This is Nov-2 RegionServer1: 2011-11-02 00:12:29,352 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:12:35,220 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:12:37,639 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:12:42,865 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:12:49,710 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:12:54,079 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:12:55,692 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:13:01,779 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:15:03,044 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:15:05,686 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:16:07,621 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:16:19,499 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:17:23,085 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:17:29,428 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:18:23,436 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:18:25,183 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:19:02,230 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:19:04,624 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:19:06,577 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:20:22,306 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:20:25,645 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:20:27,662 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:21:31,655 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:22:13,565 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 00:22:15,892 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:05:11,911 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:10:00,359 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:10:02,612 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:10:31,625 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:21:06,955 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:23:33,967 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:23:37,270 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:24:45,313 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:24:48,087 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:26:49,583 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:27:52,948 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:28:54,638 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 06:29:20,036 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:03:02,271 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:03:04,420 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:04:20,810 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog 2011-11-02 20:05:48,922 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append.
-
Re: HBase 0.90.4 missing data in productionStack 2011-11-04, 22:13
On Fri, Nov 4, 2011 at 2:47 PM, Vladimir Rodionov
<[EMAIL PROTECTED]> wrote: > There are no ERRORs and FATALs in Region Servers logs except one FATAL I have already posted. HLog sync > I am asking expert advise: what else should I check? > 8 rows out of 10k? Can you find what the 8 are? Can you go do the src data to figure which? Can you see anything in particular regards this 8 rows and say, those around them? Once you have the 8 rows, can you see where they were inserted? Can you figure time of insertion? Can you see if anything happened to the data at that time? Can you look in the regions? Look at the hfiles under the region. Use the tool here: http://hbase.apache.org/book.html#hfile_tool. See if the rows are in there? St.Ack
-
Re: HBase 0.90.4 missing data in productionCosmin Lehene 2011-11-07, 17:07
Dropping in.
Vladimir, can you check if there's any time skew between servers? (I can't remember the exact details, but once we ended up with .META. region location (cells) that were dated in the future because time was skewed on some machine - so regions were not visible for a few hours) Also, if you're willing to dig deeper, you could try loading individual HFiles and counting the rows directly. Cosmin On 11/5/11 12:13 AM, "Stack" <[EMAIL PROTECTED]> wrote: >On Fri, Nov 4, 2011 at 2:47 PM, Vladimir Rodionov ><[EMAIL PROTECTED]> wrote: >> There are no ERRORs and FATALs in Region Servers logs except one FATAL >>I have already posted. HLog sync >> I am asking expert advise: what else should I check? >> > >8 rows out of 10k? > >Can you find what the 8 are? Can you go do the src data to figure >which? Can you see anything in particular regards this 8 rows and >say, those around them? Once you have the 8 rows, can you see where >they were inserted? Can you figure time of insertion? Can you see if >anything happened to the data at that time? Can you look in the >regions? Look at the hfiles under the region. Use the tool here: >http://hbase.apache.org/book.html#hfile_tool. See if the rows are in >there? > >St.Ack |