-Re: Long time fail over when using QJM
Todd Lipcon 2013-08-29, 16:53
If you're seeing those log messages, the SBN was already active at that
time. It only logs that message when successfully writing transactions. So,
the failover must have already completed before the logs you're looking at.
On Thu, Aug 29, 2013 at 1:18 AM, Mickey <[EMAIL PROTECTED]> wrote:
> Hi, all
> I tried to test the QJM HA and it always works good. But, yestoday I met
> an quite long time fail over with QJM. The test is base on the CDH4.3.0.
> The attachment is the standby namenode and the journalnode 's logs.
> The network cable on active namenode(also a datanode) was pulled out at
> about 07:24. From the standby-namenode log I found log like this:
> 2013-08-28 07:24:51,122 INFO
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 1
> Total time for transactions(ms): 1Number of transactions batched in Syncs:
> 0 Number of syncs: 0 SyncTimes(ms): 0 41 42
> 2013-08-28 07:36:14,028 INFO
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions:
> 32 Total time for transactions(ms): 3Number of transactions batched in
> Syncs: 0 Number of syncs: 1 SyncTimes(ms): 9 49 46
> The information seems regular. The problem is that between the 2 lines
> there's no log in 12 minutes. There is no long gc happened. It seems the
> code blocked somewhere. Unfortunately, I forgot to print the jstack info
> Hope for your response.
> Best regards,
Software Engineer, Cloudera