Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # dev - Problem with BackupNode?


Copy link to this message
-
Re: Problem with BackupNode?
André Oriani 2011-06-18, 03:10
Hi Ivan,

Sorry for taking long time to answer your email. I did the test as you asked
and I found the commit below as the one that caused the breakage.  I wish I
could provide a fix, but I do not have time for today.
commit 27b956fa62ce9b467ab7dd287dd6dcd5ab6a0cb3
Author: Hairong Kuang <[EMAIL PROTECTED]>
Date:   Mon Apr 11 17:15:27 2011 +0000

    HDFS-1630. Support fsedits checksum. Contrbuted by Hairong Kuang.
    git-svn-id:
https://svn.apache.org/repos/asf/hadoop/hdfs/trunk@109113113f79535-47bb-0310-9956-ffa450edef68
Regards,
André Oriani
On Thu, Jun 16, 2011 at 07:31, Ivan Kelly <[EMAIL PROTECTED]> wrote:

> This seems to have been introduced here:
> https://github.com/apache/**hadoop-hdfs/commit/**
> 27b956fa62ce9b467ab7dd287dd6dc**d5ab6a0cb3#src/java/org/**
> apache/hadoop/hdfs/server/**namenode/BackupImage.java<https://github.com/apache/hadoop-hdfs/commit/27b956fa62ce9b467ab7dd287dd6dcd5ab6a0cb3#src/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java>
> The backup streams never write the version, so it should never try to read
> it either. I would have expected this to fail earlier as it's reading junk
> since the stream pointer is a int past where it should be. BackupStreams
> don't write the checksum either. This really should have failed the
> BackupNode unit test, but I think there other problems with that. cf.
> https://issues.apache.org/**jira/browse/HDFS-1521?**
> focusedCommentId=13010242&**page=com.atlassian.jira.**
> plugin.system.issuetabpanels:**comment-tabpanel#comment-**13010242<https://issues.apache.org/jira/browse/HDFS-1521?focusedCommentId=13010242&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13010242>
>
> Could you try again with code from April 10th.
>
> Another candidate for causing it could be HDFS-2003 which went in on the
> 8th of this month.
>
>
>
>
>
>
> On 16/06/2011 00:42, André Oriani wrote:
>
>> Hi,
>>
>> My repo is one week old  and the change I did was to modify the
>> Configuration object at BackupNode.initialize() to make the name and edit
>> dirs to other directories, so I could run both namenode and backup node in
>> the same machine.  When I copied a file to HDFS, the follow exception was
>> below was thrown. Have anyone seem that ?
>>
>>
>> 11/06/15 17:52:22 INFO ipc.Server: IPC Server handler 1 on 50100, call
>> journal(NamenodeRegistration(**localhost:8020, role=NameNode), 101, 164,
>> [B@3951f910), rpc version=1, client version=5,
>> methodsFingerPrint=302283637
>> from 192.168.1.102:56780: error: java.io.IOException: Error replaying
>> edit
>> log at offset 13
>> Recent opcode offsets: 1
>> java.io.IOException: Error replaying edit log at offset 13
>> Recent opcode offsets: 1
>> at
>> org.apache.hadoop.hdfs.server.**namenode.FSEditLogLoader.**
>> loadEditRecords(**FSEditLogLoader.java:514)
>>  at
>> org.apache.hadoop.hdfs.server.**namenode.BackupImage.journal(**
>> BackupImage.java:242)
>> at
>> org.apache.hadoop.hdfs.server.**namenode.BackupNode.journal(**
>> BackupNode.java:251)
>>  at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native Method)
>> at
>> sun.reflect.**NativeMethodAccessorImpl.**invoke(**
>> NativeMethodAccessorImpl.java:**39)
>>  at
>> sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
>> DelegatingMethodAccessorImpl.**java:25)
>> at java.lang.reflect.Method.**invoke(Method.java:597)
>>  at
>> org.apache.hadoop.ipc.**WritableRpcEngine$Server.call(**
>> WritableRpcEngine.java:422)
>> at org.apache.hadoop.ipc.Server$**Handler$1.run(Server.java:**1496)
>>  at org.apache.hadoop.ipc.Server$**Handler$1.run(Server.java:**1492)
>> at java.security.**AccessController.doPrivileged(**Native Method)
>>  at javax.security.auth.Subject.**doAs(Subject.java:396)
>> at
>> org.apache.hadoop.security.**UserGroupInformation.doAs(**
>> UserGroupInformation.java:**1131)
>>  at org.apache.hadoop.ipc.Server$**Handler.run(Server.java:1490)
>> Caused by: org.apache.hadoop.fs.**ChecksumException: Transaction 1 is
>> corrupt.