Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Flume 1.2.0 HDFS Sink Output File Question


Copy link to this message
-
Re: Flume 1.2.0 HDFS Sink Output File Question
Denny Ye 2012-07-31, 17:52
hi Yongcheng,
    Flume doesn't recheck the destination in last Agent lifecycle. The last
temporary file is not be reused in current process. Possible reason of this
case might be : 1. Did that temporary file was closed normally? If not,
Flume should close that file with appropriate way like 'recoverLease'
interface.  2. Does that file name can be reuse in latest path pattern?

    No matter which case, we hope that there is unified activity in path
pattern. Just like your mention, I agree with you. Need some other guys to
discuss may be.

-Regards
Denny Ye

2012/7/31 Yongcheng Li <[EMAIL PROTECTED]>

>  Hi,****
>
> ** **
>
> I am using Flume 1.2.0 HDFS sink. When Flume crashes (being killed), a
> file name with a suffix of .tmp is generated. I believe it contains the
> data that were flushed into disk when the crash happens. But why does it
> have a .tmp suffix? Shouldn’t Flume just write it into a regular file
> (without .tmp)?****
>
> ** **
>
> I am using month/day/hour as part of my HDFS file name (%m_%d_%H). When
> the hour passes, it still has a file like 07_31_09.events.1343742385766.tmp
> with a size of zero. Shouldn’t Flume just close that file and remove the
> .tmp suffix? When I kill Flume, I can see data written into this file but
> still with a .tmp suffix.****
>
> ** **
>
> Thanks!****
>
> ** **
>
> Yongcheng****
>