Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> .tmp in hdfs sink


+
Mohit Anchlia 2012-11-11, 16:32
+
Mike Percy 2012-11-15, 19:54
+
Mohit Anchlia 2012-11-16, 04:14
+
Mohit Anchlia 2012-11-16, 04:16
Copy link to this message
-
Re: .tmp in hdfs sink
I have made this in past per minute rolls (YY/MM/DD/HH/MM) and closed a sink after 30 secs. This matched in my cases mostly perfect. But depends on your use case.

Cheers,
 Alex

On Nov 16, 2012, at 5:16 AM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:

> Another question I had was about rollover. What's the best way to rollover
> files in reasonable timeframe? For instance our path is YY/MM/DD/HH so
> every hour there is new file and the -1 hr is just sitting with .tmp and it
> takes sometimes even hour before .tmp is closed and renamed to .snappy. In
> this situation is there a way to tell flume to rollover files sooner based
> on some idle time limit?
>
> On Thu, Nov 15, 2012 at 8:14 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
>
>> Thanks Mike it makes sense. Anyway I can help?
>>
>>
>> On Thu, Nov 15, 2012 at 11:54 AM, Mike Percy <[EMAIL PROTECTED]> wrote:
>>
>>> Hi Mohit, this is a complicated issue. I've filed
>>> https://issues.apache.org/jira/browse/FLUME-1714 to track it.
>>>
>>> In short, it would require a non-trivial amount of work to implement
>>> this, and it would need to be done carefully. I agree that it would be
>>> better if Flume handled this case more gracefully than it does today.
>>> Today, Flume assumes that you have some job that would go and clean up the
>>> .tmp files as needed, and that you understand that they could be partially
>>> written if a crash occurred.
>>>
>>> Regards,
>>> Mike
>>>
>>>
>>> On Sun, Nov 11, 2012 at 8:32 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
>>>
>>>> What we are seeing is that if flume gets killed either because of server
>>>> failure or other reasons, it keeps around the .tmp file. Sometimes for
>>>> whatever reasons .tmp file is not readable. Is there a way to rollover .tmp
>>>> file more gracefully?
>>>>
>>>
>>>
>>

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF
+
Juhani Connolly 2012-11-16, 07:23
+
Mike Percy 2012-11-20, 19:16
+
Mohit Anchlia 2012-11-20, 20:53
+
Brock Noland 2012-11-20, 20:56
+
Mike Percy 2012-11-20, 23:11
+
Mohit Anchlia 2012-11-20, 23:24
+
Mohit Anchlia 2012-11-29, 04:26
+
Juhani Connolly 2012-11-29, 05:20
+
Mohit Anchlia 2012-11-29, 16:46