Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # dev >> Long time fail over when using QJM


Copy link to this message
-
Re: Long time fail over when using QJM
If you're seeing those log messages, the SBN was already active at that
time. It only logs that message when successfully writing transactions. So,
the failover must have already completed before the logs you're looking at.

-Todd

On Thu, Aug 29, 2013 at 1:18 AM, Mickey <[EMAIL PROTECTED]> wrote:

> Hi, all
> I tried to test the QJM HA and it always works good. But, yestoday I met
> an quite long time fail over with QJM. The test is base on the CDH4.3.0.
> The attachment is the standby namenode and the journalnode 's logs.
> The network cable on active namenode(also a datanode) was pulled out at
> about 07:24. From the standby-namenode log I found log like this:
> 2013-08-28 07:24:51,122 INFO
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 1
> Total time for transactions(ms): 1Number of transactions batched in Syncs:
> 0 Number of syncs: 0 SyncTimes(ms): 0 41 42
> 2013-08-28 07:36:14,028 INFO
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions:
> 32 Total time for transactions(ms): 3Number of transactions batched in
> Syncs: 0 Number of syncs: 1 SyncTimes(ms): 9 49 46
>
> The information seems regular. The problem is that between the 2 lines
> there's no log  in 12 minutes. There is no long gc happened. It seems the
> code blocked somewhere. Unfortunately, I forgot to print the jstack info
> T_T.
>
> Hope for your response.
>
> Best regards,
> Mickey
>

--
Todd Lipcon
Software Engineer, Cloudera