Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # dev >> Probably a bug in FsEditLog


+
Anty 2013-04-07, 09:48
+
Harsh J 2013-04-08, 08:33
+
Anty 2013-04-08, 09:14
Copy link to this message
-
Re: Probably a bug in FsEditLog
Anty,

Worth checking if you can spot the bug in branch-1 as well:
http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
(and other refs under
http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/server/namenode/)

On Mon, Apr 8, 2013 at 2:44 PM, Anty <[EMAIL PROTECTED]> wrote:
> @Harsh
> I'm using CDH3u4.However, the processing logical with regard to FSEditlog
> between CDH3U4 and hadoop 1.0.2 is almost the same.
> So i'm not sure it is proper to file a JIRA?
>
> On Mon, Apr 8, 2013 at 4:33 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>
>> Thanks for analyzing and reporting this Anty,
>>
>> What version of Apache Hadoop 1.x are you encountering this on? If
>> you've spotted the code issue on branch-1, please do log a HDFS JIRA
>> with some NN logs and your other details.
>>
>> On Sun, Apr 7, 2013 at 3:18 PM, Anty <[EMAIL PROTECTED]> wrote:
>> > Hi:ALL
>> >
>> > In our cluster, we configure the NameNode to write to both local  and NFS
>> > mounted directories. When the NFS mounted directory is inaccessible, the
>> > NameNode should keep running without error, but our namenode crash with
>> > following stack trace.
>> >
>> > 2013-04-02 23:35:21,535 WARN
>> org.apache.hadoop.hdfs.server.common.Storage:
>> >> Removing storage dir /nfs2-mount/onest3/dfs/name
>> >> 2013-04-02 23:35:21,536 FATAL
>> >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unable to find
>> edits
>> >> stream with IO error
>> >> java.lang.Exception: Unable to find edits stream with IO error
>> >>         at
>> >>
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.fatalExit(FSEditLog.java:430)
>> >>         at
>> >>
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.removeEditsStreamsAndStorageDirs(FSEditLog.java:519)
>> >>         at
>> >>
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:1139)
>> >>         at
>> >>
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1641)
>> >>         at
>> >>
>> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:689)
>> >>         at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
>> >>         at
>> >>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >>         at java.lang.reflect.Method.invoke(Method.java:597)
>> >>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>> >>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
>> >>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
>> >>         at java.security.AccessController.doPrivileged(Native Method)
>> >>         at javax.security.auth.Subject.doAs(Subject.java:396)
>> >>         at
>> >>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>> >>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
>> >> 2013-04-02 23:35:21,539 INFO
>> >> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
>> >>
>> >
>> > According to the stack trace, When NameNode tries to sync edit log, it
>> does
>> > identify the mounted NFS directory is inaccessible, and attempt to remove
>> > it from the FSEditLog#editStreams. However, it found the edit stream
>> > corresponding to the mounted NFS has already been removed. Under this
>> > circumstance, NameNode just kill itself, aborted!
>> >
>> >  After looking through the source code of HDFS, I found there is another
>> > code path of removing edit stream from FSEditLog#editStreams, which can
>> > cause above race condition. In method FSEditLog#logEdit
>> >
>> >
>> >>    if (getNumEditStreams() < 1)
>> >>         {
>> >>             throw new AssertionError("No edit streams to log to");
>> >>         }
>> >>         long start = FSNamesystem.now();
>> >>         for (int idx = 0; idx < editStreams.size(); idx++)
>> >>         {
>> >>             EditLogOutputStream eStream = editStreams.get(idx);

Harsh J
+
Anty 2013-04-08, 14:22
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB