Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - .tmp in hdfs sink


Copy link to this message
-
Re: .tmp in hdfs sink
Mohit Anchlia 2012-11-29, 04:26
If I grab the last snapshot would I get these changes?

On Tue, Nov 20, 2012 at 3:24 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> that's awesome!
>
>
> On Tue, Nov 20, 2012 at 3:11 PM, Mike Percy <[EMAIL PROTECTED]> wrote:
>
>> Mohit,
>> No problem, but Juhani did all the work. :)
>>
>> The behavior is that you can configure an HDFS sink to close a file if it
>> hasn't gotten any writes in some time. After it's been idle for 5 minutes
>> or something, it gets closed. If you get a "late" event that goes to the
>> same path after the file is closed, it will just create a new file in the
>> same path as usual.
>>
>> Regards,
>> Mike
>>
>>
>> On Tue, Nov 20, 2012 at 12:56 PM, Brock Noland <[EMAIL PROTECTED]>wrote:
>>
>>> We are currently voting on a 1.3.0 RC on the dev@ list:
>>>
>>> http://s.apache.org/OQ0W
>>>
>>> You don't have to be a committer to vote! :)
>>>
>>> Brock
>>>
>>> On Tue, Nov 20, 2012 at 2:53 PM, Mohit Anchlia <[EMAIL PROTECTED]>
>>> wrote:
>>> > Thanks a lot!! Now with this what should be the expected behaviour?
>>> After
>>> > file is closed a new file is created for writes that come after
>>> closing the
>>> > file?
>>> >
>>> > Thanks again for committing this change. Do you know when 1.3.0 is
>>> out? I am
>>> > currently using the snapshot version of 1.3.0
>>> >
>>> > On Tue, Nov 20, 2012 at 11:16 AM, Mike Percy <[EMAIL PROTECTED]>
>>> wrote:
>>> >>
>>> >> Mohit,
>>> >> FLUME-1660 is now committed and it will be in 1.3.0. In the case
>>> where you
>>> >> are using 1.2.0, I suggest running with hdfs.rollInterval set so the
>>> files
>>> >> will roll normally.
>>> >>
>>> >> Regards,
>>> >> Mike
>>> >>
>>> >>
>>> >> On Thu, Nov 15, 2012 at 11:23 PM, Juhani Connolly
>>> >> <[EMAIL PROTECTED]> wrote:
>>> >>>
>>> >>> I am actually working on a patch for exactly this, refer to
>>> FLUME-1660
>>> >>>
>>> >>> The patch is on review board right now, I fixed a corner case issue
>>> that
>>> >>> came up with unit testing, but the implementation is not really to my
>>> >>> satisfaction. If you are interested please have a look and add your
>>> opinion.
>>> >>>
>>> >>> https://issues.apache.org/jira/browse/FLUME-1660
>>> >>> https://reviews.apache.org/r/7659/
>>> >>>
>>> >>>
>>> >>> On 11/16/2012 01:16 PM, Mohit Anchlia wrote:
>>> >>>
>>> >>> Another question I had was about rollover. What's the best way to
>>> >>> rollover files in reasonable timeframe? For instance our path is
>>> YY/MM/DD/HH
>>> >>> so every hour there is new file and the -1 hr is just sitting with
>>> .tmp and
>>> >>> it takes sometimes even hour before .tmp is closed and renamed to
>>> .snappy.
>>> >>> In this situation is there a way to tell flume to rollover files
>>> sooner
>>> >>> based on some idle time limit?
>>> >>>
>>> >>> On Thu, Nov 15, 2012 at 8:14 PM, Mohit Anchlia <
>>> [EMAIL PROTECTED]>
>>> >>> wrote:
>>> >>>>
>>> >>>> Thanks Mike it makes sense. Anyway I can help?
>>> >>>>
>>> >>>>
>>> >>>> On Thu, Nov 15, 2012 at 11:54 AM, Mike Percy <[EMAIL PROTECTED]>
>>> wrote:
>>> >>>>>
>>> >>>>> Hi Mohit, this is a complicated issue. I've filed
>>> >>>>> https://issues.apache.org/jira/browse/FLUME-1714 to track it.
>>> >>>>>
>>> >>>>> In short, it would require a non-trivial amount of work to
>>> implement
>>> >>>>> this, and it would need to be done carefully. I agree that it
>>> would be
>>> >>>>> better if Flume handled this case more gracefully than it does
>>> today. Today,
>>> >>>>> Flume assumes that you have some job that would go and clean up
>>> the .tmp
>>> >>>>> files as needed, and that you understand that they could be
>>> partially
>>> >>>>> written if a crash occurred.
>>> >>>>>
>>> >>>>> Regards,
>>> >>>>> Mike
>>> >>>>>
>>> >>>>>
>>> >>>>> On Sun, Nov 11, 2012 at 8:32 AM, Mohit Anchlia <
>>> [EMAIL PROTECTED]>
>>> >>>>> wrote:
>>> >>>>>>
>>> >>>>>> What we are seeing is that if flume gets killed either because of
>>> >>>>>> server failure or other reasons, it keeps around the .tmp file.
>>> Sometimes