Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> .tmp in hdfs sink

Copy link to this message
Re: .tmp in hdfs sink
I am actually working on a patch for exactly this, refer to FLUME-1660

The patch is on review board right now, I fixed a corner case issue that
came up with unit testing, but the implementation is not really to my
satisfaction. If you are interested please have a look and add your opinion.


On 11/16/2012 01:16 PM, Mohit Anchlia wrote:
> Another question I had was about rollover. What's the best way to
> rollover files in reasonable timeframe? For instance our path is
> YY/MM/DD/HH so every hour there is new file and the -1 hr is just
> sitting with .tmp and it takes sometimes even hour before .tmp is
> closed and renamed to .snappy. In this situation is there a way to
> tell flume to rollover files sooner based on some idle time limit?
> On Thu, Nov 15, 2012 at 8:14 PM, Mohit Anchlia <[EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>> wrote:
>     Thanks Mike it makes sense. Anyway I can help?
>     On Thu, Nov 15, 2012 at 11:54 AM, Mike Percy <[EMAIL PROTECTED]
>     <mailto:[EMAIL PROTECTED]>> wrote:
>         Hi Mohit, this is a complicated issue. I've filed
>         https://issues.apache.org/jira/browse/FLUME-1714 to track it.
>         In short, it would require a non-trivial amount of work to
>         implement this, and it would need to be done carefully. I
>         agree that it would be better if Flume handled this case more
>         gracefully than it does today. Today, Flume assumes that you
>         have some job that would go and clean up the .tmp files as
>         needed, and that you understand that they could be partially
>         written if a crash occurred.
>         Regards,
>         Mike
>         On Sun, Nov 11, 2012 at 8:32 AM, Mohit Anchlia
>         <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>             What we are seeing is that if flume gets killed either
>             because of server failure or other reasons, it keeps
>             around the .tmp file. Sometimes for whatever reasons .tmp
>             file is not readable. Is there a way to rollover .tmp file
>             more gracefully?