|
|
-
Nonempty files become empty after being saved to Hdfs.
Tianqiang Peter Li 2012-02-18, 22:38
Hi, guys, I am using scribe to write data directly to hdfs, it works well most of the time, but sporadically, I found some of files(1-2 out of 6000 per day) written to hdfs become empty files, interestingly, this only happens when hdfs is busy handing lots of other write request at the same time, here is log pulled from namenode's log output, it seems the allocateBlock operation following the create operation is vanished, unlike other normal operations. Here are the comparison of a normal file and a abnormal one below:
----------- *Normal* one ( file:/prefix-2012-02-18_00410 )--------------------------------- 2012-02-18 02:32:15,873 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=user ip=/ 10.16.35.236 cmd=create src=/prefix-2012-02-18_00410 dst=null perm=user:supergroup:rw-r--r-- 2012-02-18 02:32:15,905 INFO org.apache.hadoop.hdfs.StateChange: *BLOCK* NameSystem.allocateBlock*: /prefix-2012-02-18_00410. blk_-2839230806711337016_129091175 2012-02-18 02:32:17,422 INFO org.apache.hadoop.hdfs.StateChange: Removing lease on file /prefix-2012-02-18_00410 from client DFSClient_-36442571 2012-02-18 02:32:17,422 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /prefix-2012-02-18_00410 is closed by DFSClient_-36442571 ----------- End -------------- ----------- *Problematic* one ( file:/prefix-2012-02-18_00409 )---------------------------- 2012-02-18 02:30:47,287 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=user ip=/ 10.16.35.236 cmd=create src=/prefix-2012-02-18_00409 dst=null perm=user:supergroup:rw-r--r-- *<comment: no allocateBlock is found, and there is 60-90 sec gap between>* 2012-02-18 02:32:15,648 INFO org.apache.hadoop.hdfs.StateChange: Removing lease on file /prefix-2012-02-18_00409 from client DFSClient_748605466 2012-02-18 02:32:15,648 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /prefix-2012-02-18_00409 is closed by DFSClient_748605466 ----------- End -------------------
Not sure this is more scribe-related or hdfs-related, let me try shooting the question in this mailing-list.
The 90 secs delay looks suspicious to me, and might cause this issue as I guess, did you guys meet this situation before? Any light shed on me about why is that and how to work it around to guarantee the entire file has been successfully saved into Hdfs? Any places I should focus on for improvements?(on hdfs side(e.g. hdfs tunning, gc tuning etc) or application side(e.g. timeout setting on hdfs api calling in scribe), your suggestions/thoughts are greatly appreciated.
Regards, Peter
-
Re: Nonempty files become empty after being saved to Hdfs.
Denny Ye 2012-02-20, 02:25
hi Peter, I met this problem in our environment at one time. It may caused by the blank data stream from Scribe to HDFS. In fixed rotate model, there is no any data. Otherwise, NameNode can start the 'allocate block' operation during the first data packet to HDFS client. Also, this only one tip to your doubt.
-Regards Denny Ye
2012/2/19 Tianqiang Peter Li <[EMAIL PROTECTED]>
> Hi, guys, > I am using scribe to write data directly to hdfs, it works well most of > the time, but sporadically, I found some of files(1-2 out of 6000 per day) > written to hdfs become empty files, interestingly, this only happens when > hdfs is busy handing lots of other write request at the same time, here is > log pulled from namenode's log output, it seems the allocateBlock operation > following the create operation is vanished, unlike other normal operations. > Here are the comparison of a normal file and a abnormal one below: > > ----------- *Normal* one ( > file:/prefix-2012-02-18_00410 )--------------------------------- > 2012-02-18 02:32:15,873 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=user ip=/ > 10.16.35.236 cmd=create src=/prefix-2012-02-18_00410 > dst=null perm=user:supergroup:rw-r--r-- > 2012-02-18 02:32:15,905 INFO org.apache.hadoop.hdfs.StateChange: *BLOCK* > NameSystem.allocateBlock*: /prefix-2012-02-18_00410. > blk_-2839230806711337016_129091175 > 2012-02-18 02:32:17,422 INFO org.apache.hadoop.hdfs.StateChange: Removing > lease on file /prefix-2012-02-18_00410 from client DFSClient_-36442571 > 2012-02-18 02:32:17,422 INFO org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.completeFile: file /prefix-2012-02-18_00410 is closed by > DFSClient_-36442571 > ----------- End -------------- > ----------- *Problematic* one ( > file:/prefix-2012-02-18_00409 )---------------------------- > 2012-02-18 02:30:47,287 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=user ip=/ > 10.16.35.236 cmd=create src=/prefix-2012-02-18_00409 > dst=null perm=user:supergroup:rw-r--r-- > *<comment: no allocateBlock is found, and there is 60-90 sec gap between>* > 2012-02-18 02:32:15,648 INFO org.apache.hadoop.hdfs.StateChange: Removing > lease on file /prefix-2012-02-18_00409 from client DFSClient_748605466 > 2012-02-18 02:32:15,648 INFO org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.completeFile: file /prefix-2012-02-18_00409 is closed by > DFSClient_748605466 > ----------- End ------------------- > > Not sure this is more scribe-related or hdfs-related, let me try shooting > the question in this mailing-list. > > The 90 secs delay looks suspicious to me, and might cause this issue as I > guess, did you guys meet this situation before? Any light shed on me about > why is that and how to work it around to guarantee the entire file has been > successfully saved into Hdfs? Any places I should focus on for > improvements?(on hdfs side(e.g. hdfs tunning, gc tuning etc) or application > side(e.g. timeout setting on hdfs api calling in scribe), your > suggestions/thoughts are greatly appreciated. > > Regards, > Peter > >
-
Re: Nonempty files become empty after being saved to Hdfs.
Tianqiang Peter Li 2012-02-24, 07:31
Hi, Denny, Thanks for the info, you are right about his, after some digging, I found there were no data loss, the root cause of getting empty file is that during the file rotation, the old file was closed, the new empty new file was created/opened, then hdfs tried to create the small file(acting as symbol link in hdfs) for this file, but for some reasons(gc?), the operation for creating this symbol link file stuck for 2 minutes, then right after it was successfully created, the scribe did a periodic check, found this file has passed the rotation window(the rotation period is 1min), then rotate out this newly-created empty file. It's good to know there is no data integrity issue. I am still looking at where this big hiccup comes from. Anyway, thanks for the helpful comments on this.
-Peter
On Sun, Feb 19, 2012 at 6:25 PM, Denny Ye <[EMAIL PROTECTED]> wrote:
> hi Peter, > I met this problem in our environment at one time. It may caused by > the blank data stream from Scribe to HDFS. In fixed rotate model, there is > no any data. Otherwise, NameNode can start the 'allocate block' operation > during the first data packet to HDFS client. Also, this only one tip to > your doubt. > > -Regards > Denny Ye > > > 2012/2/19 Tianqiang Peter Li <[EMAIL PROTECTED]> > >> Hi, guys, >> I am using scribe to write data directly to hdfs, it works well most of >> the time, but sporadically, I found some of files(1-2 out of 6000 per day) >> written to hdfs become empty files, interestingly, this only happens when >> hdfs is busy handing lots of other write request at the same time, here is >> log pulled from namenode's log output, it seems the allocateBlock operation >> following the create operation is vanished, unlike other normal operations. >> Here are the comparison of a normal file and a abnormal one below: >> >> ----------- *Normal* one ( >> file:/prefix-2012-02-18_00410 )--------------------------------- >> 2012-02-18 02:32:15,873 INFO >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=user ip=/ >> 10.16.35.236 cmd=create src=/prefix-2012-02-18_00410 >> dst=null perm=user:supergroup:rw-r--r-- >> 2012-02-18 02:32:15,905 INFO org.apache.hadoop.hdfs.StateChange: *BLOCK* >> NameSystem.allocateBlock*: /prefix-2012-02-18_00410. >> blk_-2839230806711337016_129091175 >> 2012-02-18 02:32:17,422 INFO org.apache.hadoop.hdfs.StateChange: Removing >> lease on file /prefix-2012-02-18_00410 from client DFSClient_-36442571 >> 2012-02-18 02:32:17,422 INFO org.apache.hadoop.hdfs.StateChange: DIR* >> NameSystem.completeFile: file /prefix-2012-02-18_00410 is closed by >> DFSClient_-36442571 >> ----------- End -------------- >> ----------- *Problematic* one ( >> file:/prefix-2012-02-18_00409 )---------------------------- >> 2012-02-18 02:30:47,287 INFO >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=user ip=/ >> 10.16.35.236 cmd=create src=/prefix-2012-02-18_00409 >> dst=null perm=user:supergroup:rw-r--r-- >> *<comment: no allocateBlock is found, and there is 60-90 sec gap between> >> * >> 2012-02-18 02:32:15,648 INFO org.apache.hadoop.hdfs.StateChange: Removing >> lease on file /prefix-2012-02-18_00409 from client DFSClient_748605466 >> 2012-02-18 02:32:15,648 INFO org.apache.hadoop.hdfs.StateChange: DIR* >> NameSystem.completeFile: file /prefix-2012-02-18_00409 is closed by >> DFSClient_748605466 >> ----------- End ------------------- >> >> Not sure this is more scribe-related or hdfs-related, let me try shooting >> the question in this mailing-list. >> >> The 90 secs delay looks suspicious to me, and might cause this issue as I >> guess, did you guys meet this situation before? Any light shed on me about >> why is that and how to work it around to guarantee the entire file has been >> successfully saved into Hdfs? Any places I should focus on for >> improvements?(on hdfs side(e.g. hdfs tunning, gc tuning etc) or application >> side(e.g. timeout setting on hdfs api calling in scribe), your
-
Re: Nonempty files become empty after being saved to Hdfs.
Denny Ye 2012-02-24, 07:55
hi Peter, There is no any operation log? Please using the state change log with debug level. I checked the code for creating 'symlink'. It looks like regular operation with other interfaces at NameNode. GC, directory lock, shell script may be doubtful point.
-Regards Denny Ye
2012/2/24 Tianqiang Peter Li <[EMAIL PROTECTED]>
> Hi, Denny, > Thanks for the info, you are right about his, after some digging, I found > there were no data loss, the root cause of getting empty file is that > during the file rotation, the old file was closed, the new empty new file > was created/opened, then hdfs tried to create the small file(acting as > symbol link in hdfs) for this file, but for some reasons(gc?), the > operation for creating this symbol link file stuck for 2 minutes, then > right after it was successfully created, the scribe did a periodic check, > found this file has passed the rotation window(the rotation period is > 1min), then rotate out this newly-created empty file. It's good to know > there is no data integrity issue. I am still looking at where this big > hiccup comes from. Anyway, thanks for the helpful comments on this. > > -Peter > > > On Sun, Feb 19, 2012 at 6:25 PM, Denny Ye <[EMAIL PROTECTED]> wrote: > >> hi Peter, >> I met this problem in our environment at one time. It may caused >> by the blank data stream from Scribe to HDFS. In fixed rotate model, there >> is no any data. Otherwise, NameNode can start the 'allocate block' >> operation during the first data packet to HDFS client. Also, this only one >> tip to your doubt. >> >> -Regards >> Denny Ye >> >> >> 2012/2/19 Tianqiang Peter Li <[EMAIL PROTECTED]> >> >>> Hi, guys, >>> I am using scribe to write data directly to hdfs, it works well most of >>> the time, but sporadically, I found some of files(1-2 out of 6000 per day) >>> written to hdfs become empty files, interestingly, this only happens when >>> hdfs is busy handing lots of other write request at the same time, here is >>> log pulled from namenode's log output, it seems the allocateBlock operation >>> following the create operation is vanished, unlike other normal operations. >>> Here are the comparison of a normal file and a abnormal one below: >>> >>> ----------- *Normal* one ( >>> file:/prefix-2012-02-18_00410 )--------------------------------- >>> 2012-02-18 02:32:15,873 INFO >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=user ip=/ >>> 10.16.35.236 cmd=create src=/prefix-2012-02-18_00410 >>> dst=null perm=user:supergroup:rw-r--r-- >>> 2012-02-18 02:32:15,905 INFO org.apache.hadoop.hdfs.StateChange: *BLOCK* >>> NameSystem.allocateBlock*: /prefix-2012-02-18_00410. >>> blk_-2839230806711337016_129091175 >>> 2012-02-18 02:32:17,422 INFO org.apache.hadoop.hdfs.StateChange: >>> Removing lease on file /prefix-2012-02-18_00410 from client >>> DFSClient_-36442571 >>> 2012-02-18 02:32:17,422 INFO org.apache.hadoop.hdfs.StateChange: DIR* >>> NameSystem.completeFile: file /prefix-2012-02-18_00410 is closed by >>> DFSClient_-36442571 >>> ----------- End -------------- >>> ----------- *Problematic* one ( >>> file:/prefix-2012-02-18_00409 )---------------------------- >>> 2012-02-18 02:30:47,287 INFO >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=user ip=/ >>> 10.16.35.236 cmd=create src=/prefix-2012-02-18_00409 >>> dst=null perm=user:supergroup:rw-r--r-- >>> *<comment: no allocateBlock is found, and there is 60-90 sec gap >>> between>* >>> 2012-02-18 02:32:15,648 INFO org.apache.hadoop.hdfs.StateChange: >>> Removing lease on file /prefix-2012-02-18_00409 from client >>> DFSClient_748605466 >>> 2012-02-18 02:32:15,648 INFO org.apache.hadoop.hdfs.StateChange: DIR* >>> NameSystem.completeFile: file /prefix-2012-02-18_00409 is closed by >>> DFSClient_748605466 >>> ----------- End ------------------- >>> >>> Not sure this is more scribe-related or hdfs-related, let me try >>> shooting the question in this mailing-list. >
|
|