Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Nonempty files become empty after being saved to Hdfs.


Copy link to this message
-
Re: Nonempty files become empty after being saved to Hdfs.
hi Peter,
       I met this problem in our environment at one time. It may caused by
the blank data stream from Scribe to HDFS. In fixed rotate model, there is
no any data. Otherwise, NameNode can start the 'allocate block' operation
during the first data packet to HDFS client. Also, this only one tip to
your doubt.

-Regards
Denny Ye

2012/2/19 Tianqiang Peter Li <[EMAIL PROTECTED]>

> Hi, guys,
> I am using scribe to write data directly to hdfs, it works well most of
> the time, but sporadically, I found some of files(1-2 out of 6000 per day)
> written to hdfs become empty files, interestingly, this only happens when
> hdfs is busy handing lots of other write request at the same time, here is
> log pulled from namenode's log output, it seems the allocateBlock operation
> following the create operation is vanished, unlike other normal operations.
> Here are the comparison of a normal file and a abnormal one below:
>
> ----------- *Normal* one  (
> file:/prefix-2012-02-18_00410 )---------------------------------
> 2012-02-18 02:32:15,873 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=user ip=/
> 10.16.35.236        cmd=create      src=/prefix-2012-02-18_00410
> dst=null        perm=user:supergroup:rw-r--r--
> 2012-02-18 02:32:15,905 INFO org.apache.hadoop.hdfs.StateChange: *BLOCK*
> NameSystem.allocateBlock*: /prefix-2012-02-18_00410.
> blk_-2839230806711337016_129091175
> 2012-02-18 02:32:17,422 INFO org.apache.hadoop.hdfs.StateChange: Removing
> lease on  file /prefix-2012-02-18_00410 from client DFSClient_-36442571
> 2012-02-18 02:32:17,422 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.completeFile: file /prefix-2012-02-18_00410 is closed by
> DFSClient_-36442571
> ----------- End --------------
> ----------- *Problematic* one (
> file:/prefix-2012-02-18_00409 )----------------------------
> 2012-02-18 02:30:47,287 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=user ip=/
> 10.16.35.236        cmd=create      src=/prefix-2012-02-18_00409
> dst=null        perm=user:supergroup:rw-r--r--
> *<comment: no allocateBlock is found, and there is 60-90 sec gap between>*
> 2012-02-18 02:32:15,648 INFO org.apache.hadoop.hdfs.StateChange: Removing
> lease on  file /prefix-2012-02-18_00409 from client DFSClient_748605466
> 2012-02-18 02:32:15,648 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.completeFile: file /prefix-2012-02-18_00409 is closed by
> DFSClient_748605466
> ----------- End -------------------
>
> Not sure this is more scribe-related or hdfs-related, let me try shooting
> the question in this mailing-list.
>
> The 90 secs delay looks suspicious to me, and might cause this issue as I
> guess, did you guys meet this situation before? Any light shed on me about
> why is that and how to work it around to guarantee the entire file has been
> successfully saved into Hdfs?  Any places I should focus on for
> improvements?(on hdfs side(e.g. hdfs tunning, gc tuning etc) or application
> side(e.g. timeout setting on hdfs api calling in scribe), your
> suggestions/thoughts are greatly appreciated.
>
> Regards,
> Peter
>
>