Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> .tmp in hdfs sink


+
Mohit Anchlia 2012-11-11, 16:32
+
Mike Percy 2012-11-15, 19:54
+
Mohit Anchlia 2012-11-16, 04:14
+
Mohit Anchlia 2012-11-16, 04:16
Copy link to this message
-
Re: .tmp in hdfs sink
I have made this in past per minute rolls (YY/MM/DD/HH/MM) and closed a sink after 30 secs. This matched in my cases mostly perfect. But depends on your use case.

Cheers,
 Alex

On Nov 16, 2012, at 5:16 AM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:

> Another question I had was about rollover. What's the best way to rollover
> files in reasonable timeframe? For instance our path is YY/MM/DD/HH so
> every hour there is new file and the -1 hr is just sitting with .tmp and it
> takes sometimes even hour before .tmp is closed and renamed to .snappy. In
> this situation is there a way to tell flume to rollover files sooner based
> on some idle time limit?
>
> On Thu, Nov 15, 2012 at 8:14 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
>
>> Thanks Mike it makes sense. Anyway I can help?
>>
>>
>> On Thu, Nov 15, 2012 at 11:54 AM, Mike Percy <[EMAIL PROTECTED]> wrote:
>>
>>> Hi Mohit, this is a complicated issue. I've filed
>>> https://issues.apache.org/jira/browse/FLUME-1714 to track it.
>>>
>>> In short, it would require a non-trivial amount of work to implement
>>> this, and it would need to be done carefully. I agree that it would be
>>> better if Flume handled this case more gracefully than it does today.
>>> Today, Flume assumes that you have some job that would go and clean up the
>>> .tmp files as needed, and that you understand that they could be partially
>>> written if a crash occurred.
>>>
>>> Regards,
>>> Mike
>>>
>>>
>>> On Sun, Nov 11, 2012 at 8:32 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
>>>
>>>> What we are seeing is that if flume gets killed either because of server
>>>> failure or other reasons, it keeps around the .tmp file. Sometimes for
>>>> whatever reasons .tmp file is not readable. Is there a way to rollover .tmp
>>>> file more gracefully?
>>>>
>>>
>>>
>>

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF
+
Juhani Connolly 2012-11-16, 07:23
+
Mike Percy 2012-11-20, 19:16
+
Mohit Anchlia 2012-11-20, 20:53
+
Brock Noland 2012-11-20, 20:56
+
Mike Percy 2012-11-20, 23:11
+
Mohit Anchlia 2012-11-20, 23:24
+
Mohit Anchlia 2012-11-29, 04:26
+
Juhani Connolly 2012-11-29, 05:20
+
Mohit Anchlia 2012-11-29, 16:46
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB