Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Never ending distributed log split


Copy link to this message
-
Re: Never ending distributed log split
Jean-Marc Spaggiari 2012-08-03, 20:15
2012/8/3, Jean-Daniel Cryans <[EMAIL PROTECTED]>:
> On Fri, Aug 3, 2012 at 8:15 AM, Jean-Marc Spaggiari
> <[EMAIL PROTECTED]> wrote:
>> Me again ;)
>>
>> I did some more investigation.
>
> It would really help to see the region server log although the fsck
> output might be enough.

I looked under evey directory and only one is containing a file.

http://pastebin.com/8Fea2EnA

It seems to be related to node1. On this server, seems that everything
is started correctly:
hadoop@node1:~$ /usr/local/jdk1.7.0_05/bin/jps
2211 DataNode
2938 Jps
2136 TaskTracker

hbase@node1:~$ /usr/local/jdk1.7.0_05/bin/jps
2419 HRegionServer
3708 Jps

On the Node1 region server logs, I can see the same information, which
is, the file is not hosted anywhere.

2012-08-03 15:01:31,216 WARN org.apache.hadoop.hdfs.DFSClient: DFS
Read: java.io.IOException: Could not obtain block:
blk_4965382127800577452_15852
file=/hbase/.logs/node1,60020,1343908057567-splitting/node1%2C60020%2C1343908057567.1343914548297
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:2266)
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:2060)
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2221)
        at java.io.DataInputStream.read(DataInputStream.java:149)
        at java.io.DataInputStream.readFully(DataInputStream.java:195)
        at java.io.DataInputStream.readFully(DataInputStream.java:169)
        at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:55)
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:175)
        at org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:688)
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:850)
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:763)
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:384)
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:351)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:266)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
        at java.lang.Thread.run(Thread.java:722)

> BTW you'll find 0.94.1 RC1 here:
> http://people.apache.org/~larsh/hbase-0.94.1-rc1/

Super, thanks! I will most probably try it instead of the 0.94.0
>> And I found that:
>>
>> http://pastebin.com/Bedm6Ldy
>>
>> Seems that no region is serving my logs. That's strange because all my
>> servers are up and fsck is telling me that FS is clean.
>
> I don't get the "Seems that no region is serving my logs" part. A
> region doesn't serve logs, it serves HFiles. You meant to say
> DataNode?

I was talking about the files under /hbase/.logs . Base on the
directory name I thought it was some logs. What ever this file is
supposed to be for, it seems it's not served by any datanode.
>> Can I just delete those files? What's the impact of such delete? I
>> don't really worrie about loosing some data. It's a test environment.
>> But I really need it to start again.
>
> I wonder if it's related to:
> https://issues.apache.org/jira/browse/HBASE-6401
>
> Did you remove a datanode from the cluster as part of the maintenance?

It might be related to this Jira. You, I stopped all the datanodes for
the maintenance (Had to work on the power suply...). I had to do that
promptly so I "just" stopped everything with init 0.
That's fine. Nothing was appening in the cluster for hours. So I'm not
really expecting to loose anything. So I will try to delete the
file...
Here are the logs where we can see the file creation:
http://pastebin.com/HBc28zab Nothing weird in it I think.

When I removed the file, the region server crashed and had to be restarted.

Restart was not working:
2012-08-03 16:07:49,119 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: remote error
telling master we are up
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hbase.PleaseHoldException: Server
serverName=node1,60020

2012-08-03 16:07:46,112 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: remote error
telling master we are up
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hbase.PleaseHoldException: Server
serverName=node1,60020,1344024290513 rejected; we already have
node1,60020,1343998593757 registered with same hostname and port
        at org.apache.hadoop.hbase.master.ServerManager.checkAlreadySameHostPort(ServerManager.java:194)
        at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:153)
        at org.apache.hadoop.hbase.master.HMaster.regionServerStartup(HMaster.java:860)
        at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1376)

        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:918)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.jav