Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> .tmp in hdfs sink


+
Mohit Anchlia 2012-11-11, 16:32
+
Mike Percy 2012-11-15, 19:54
+
Mohit Anchlia 2012-11-16, 04:14
+
Mohit Anchlia 2012-11-16, 04:16
+
Alexander Alten-Lorenz 2012-11-16, 06:51
Copy link to this message
-
Re: .tmp in hdfs sink
I am actually working on a patch for exactly this, refer to FLUME-1660

The patch is on review board right now, I fixed a corner case issue that
came up with unit testing, but the implementation is not really to my
satisfaction. If you are interested please have a look and add your opinion.

https://issues.apache.org/jira/browse/FLUME-1660
https://reviews.apache.org/r/7659/

On 11/16/2012 01:16 PM, Mohit Anchlia wrote:
> Another question I had was about rollover. What's the best way to
> rollover files in reasonable timeframe? For instance our path is
> YY/MM/DD/HH so every hour there is new file and the -1 hr is just
> sitting with .tmp and it takes sometimes even hour before .tmp is
> closed and renamed to .snappy. In this situation is there a way to
> tell flume to rollover files sooner based on some idle time limit?
>
> On Thu, Nov 15, 2012 at 8:14 PM, Mohit Anchlia <[EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>> wrote:
>
>     Thanks Mike it makes sense. Anyway I can help?
>
>
>     On Thu, Nov 15, 2012 at 11:54 AM, Mike Percy <[EMAIL PROTECTED]
>     <mailto:[EMAIL PROTECTED]>> wrote:
>
>         Hi Mohit, this is a complicated issue. I've filed
>         https://issues.apache.org/jira/browse/FLUME-1714 to track it.
>
>         In short, it would require a non-trivial amount of work to
>         implement this, and it would need to be done carefully. I
>         agree that it would be better if Flume handled this case more
>         gracefully than it does today. Today, Flume assumes that you
>         have some job that would go and clean up the .tmp files as
>         needed, and that you understand that they could be partially
>         written if a crash occurred.
>
>         Regards,
>         Mike
>
>
>         On Sun, Nov 11, 2012 at 8:32 AM, Mohit Anchlia
>         <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>
>             What we are seeing is that if flume gets killed either
>             because of server failure or other reasons, it keeps
>             around the .tmp file. Sometimes for whatever reasons .tmp
>             file is not readable. Is there a way to rollover .tmp file
>             more gracefully?
>
>
>
>

+
Mike Percy 2012-11-20, 19:16
+
Mohit Anchlia 2012-11-20, 20:53
+
Brock Noland 2012-11-20, 20:56
+
Mike Percy 2012-11-20, 23:11
+
Mohit Anchlia 2012-11-20, 23:24
+
Mohit Anchlia 2012-11-29, 04:26
+
Juhani Connolly 2012-11-29, 05:20
+
Mohit Anchlia 2012-11-29, 16:46