Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Weird Replication exception


Copy link to this message
-
Weird Replication exception
Hi,

I have a weird error in a cluster I'm checking Replication with.

I have two clusters set up, each on its own DC (different continents). Each
has 1 master, and 3 RS.

I've done all required setup, started replication and pushed in some data
into the master. I had an issue where the slave (peer) cluster went dead
(all RS failed contacting the master), thus replication couldn't work. This
happened right before the weekend, so it was out for 3 days.

Now I'm back in the office - got slave cluster back up (just the RS), and I
got some nasty exception in one of the RS of the master cluster:

2013-06-02 04:40:45,903 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Getting
0 rs from peer cluster # c
2013-06-02 04:40:45,903 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Slave
cluster looks down: c has 0 region servers
2013-06-02 04:40:46,903 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Since
we are unable to replicate, sleeping 1000 times 10
2013-06-02 04:40:57,019 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Getting
0 rs from peer cluster # c
2013-06-02 04:40:57,019 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Slave
cluster looks down: c has 0 region servers
2013-06-02 04:40:58,019 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Since
we are unable to replicate, sleeping 1000 times 10
2013-06-02 04:41:08,134 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Getting
1 rs from peer cluster # c
2013-06-02 04:41:08,134 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Choosing peer a72-246-95-86,60020,1370147274693
2013-06-02 04:41:08,672 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Replicating 1
2013-06-02 04:41:08,971 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager:
Going to report log #172.25.98.74%2C60020%2C1369903540894.1369921571842 for
position
 49882 in hdfs://
172.25.98.73:8020/hbase/.logs/172.25.98.74,60020,1369903540894/172.25.98.74%2C60020%2C1369903540894.1369921571842
2013-06-02 04:41:08,973 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Replicated in total: 1
2013-06-02 04:41:08,973 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening
log for replication 172.25.98.74%2C60020%2C1369903540894.1369921571842 at
49882
2013-06-02 04:41:08,977 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
currentNbOperations:0 and seenEntries:4 and size: 0
2013-06-02 04:41:08,977 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager:
Going to report log #172.25.98.74%2C60020%2C1369903540894.1369921571842 for
position
 69368 in hdfs://
172.25.98.73:8020/hbase/.logs/172.25.98.74,60020,1369903540894/172.25.98.74%2C60020%2C1369903540894.1369921571842
2013-06-02 04:41:08,978 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager:
Removing 0 logs in the list: []
2013-06-02 04:41:08,978 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Nothing
to replicate, sleeping 1000 times 1
2013-06-02 04:41:09,978 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening
log for replication 172.25.98.74%2C60020%2C1369903540894.1369921571842 at
69368
2013-06-02 04:41:09,980 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
currentNbOperations:0 and seenEntries:0 and size: 0
2013-06-02 04:41:09,980 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening
log for replication 172.25.98.74%2C60020%2C1369903540894.1369925171871 at 0
2013-06-02 04:41:09,981 WARN
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: c Got:
java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:180)
        at java.io.DataInputStream.readFully(DataInputStream.java:152)
        at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)
        at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
        at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
        at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
        at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:55)
        at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:177)
        at
org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:742)
        at
org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:67)
        at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:563)
        at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:369)
2013-06-02 04:41:09,981 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Unable
to open a reader, sleeping 1000 times 1
2013-06-02 04:41:10,981 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening
log for replication 172.25.98.74%2C60020%2C1369903540894.1369925171871 at 0
Ok, so I took a look in HDFS for this logs, and here's what I had in store
for RS 74:

[EMAIL PROTECTED]:~$ hadoop fs -ls /hbase/.logs
Found 3 items
drwxr-xr-x   - hadoop supergroup          0 2013-05-30 14:46 /hbase/.logs/
172.25.98.74,60020,1369903540894
drwxr-xr-x   - hadoop supergroup          0 2013-05-30 12:46 /hbase/.logs/
172.25.98.75,60020,1369903540900
drwxr-xr-x   - hadoop supergroup          0 2013-05-30 14:46 /hbase/.logs/
172.25.98.76,60020,1369903540898
[EMAIL PROTECTED]:~$ hadoop fs -ls /hbase/.logs//hbase/.logs/172.25.98.74
,60020,1369903540894
[EMAIL PROTECTED]:~$ hadoop fs -ls /hbase/.logs/172.25.98.74
,60020,1369