Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - .tmp in hdfs sink


Copy link to this message
-
Re: .tmp in hdfs sink
Mohit Anchlia 2012-11-20, 23:24
that's awesome!

On Tue, Nov 20, 2012 at 3:11 PM, Mike Percy <[EMAIL PROTECTED]> wrote:

> Mohit,
> No problem, but Juhani did all the work. :)
>
> The behavior is that you can configure an HDFS sink to close a file if it
> hasn't gotten any writes in some time. After it's been idle for 5 minutes
> or something, it gets closed. If you get a "late" event that goes to the
> same path after the file is closed, it will just create a new file in the
> same path as usual.
>
> Regards,
> Mike
>
>
> On Tue, Nov 20, 2012 at 12:56 PM, Brock Noland <[EMAIL PROTECTED]> wrote:
>
>> We are currently voting on a 1.3.0 RC on the dev@ list:
>>
>> http://s.apache.org/OQ0W
>>
>> You don't have to be a committer to vote! :)
>>
>> Brock
>>
>> On Tue, Nov 20, 2012 at 2:53 PM, Mohit Anchlia <[EMAIL PROTECTED]>
>> wrote:
>> > Thanks a lot!! Now with this what should be the expected behaviour?
>> After
>> > file is closed a new file is created for writes that come after closing
>> the
>> > file?
>> >
>> > Thanks again for committing this change. Do you know when 1.3.0 is out?
>> I am
>> > currently using the snapshot version of 1.3.0
>> >
>> > On Tue, Nov 20, 2012 at 11:16 AM, Mike Percy <[EMAIL PROTECTED]> wrote:
>> >>
>> >> Mohit,
>> >> FLUME-1660 is now committed and it will be in 1.3.0. In the case where
>> you
>> >> are using 1.2.0, I suggest running with hdfs.rollInterval set so the
>> files
>> >> will roll normally.
>> >>
>> >> Regards,
>> >> Mike
>> >>
>> >>
>> >> On Thu, Nov 15, 2012 at 11:23 PM, Juhani Connolly
>> >> <[EMAIL PROTECTED]> wrote:
>> >>>
>> >>> I am actually working on a patch for exactly this, refer to FLUME-1660
>> >>>
>> >>> The patch is on review board right now, I fixed a corner case issue
>> that
>> >>> came up with unit testing, but the implementation is not really to my
>> >>> satisfaction. If you are interested please have a look and add your
>> opinion.
>> >>>
>> >>> https://issues.apache.org/jira/browse/FLUME-1660
>> >>> https://reviews.apache.org/r/7659/
>> >>>
>> >>>
>> >>> On 11/16/2012 01:16 PM, Mohit Anchlia wrote:
>> >>>
>> >>> Another question I had was about rollover. What's the best way to
>> >>> rollover files in reasonable timeframe? For instance our path is
>> YY/MM/DD/HH
>> >>> so every hour there is new file and the -1 hr is just sitting with
>> .tmp and
>> >>> it takes sometimes even hour before .tmp is closed and renamed to
>> .snappy.
>> >>> In this situation is there a way to tell flume to rollover files
>> sooner
>> >>> based on some idle time limit?
>> >>>
>> >>> On Thu, Nov 15, 2012 at 8:14 PM, Mohit Anchlia <
>> [EMAIL PROTECTED]>
>> >>> wrote:
>> >>>>
>> >>>> Thanks Mike it makes sense. Anyway I can help?
>> >>>>
>> >>>>
>> >>>> On Thu, Nov 15, 2012 at 11:54 AM, Mike Percy <[EMAIL PROTECTED]>
>> wrote:
>> >>>>>
>> >>>>> Hi Mohit, this is a complicated issue. I've filed
>> >>>>> https://issues.apache.org/jira/browse/FLUME-1714 to track it.
>> >>>>>
>> >>>>> In short, it would require a non-trivial amount of work to implement
>> >>>>> this, and it would need to be done carefully. I agree that it would
>> be
>> >>>>> better if Flume handled this case more gracefully than it does
>> today. Today,
>> >>>>> Flume assumes that you have some job that would go and clean up the
>> .tmp
>> >>>>> files as needed, and that you understand that they could be
>> partially
>> >>>>> written if a crash occurred.
>> >>>>>
>> >>>>> Regards,
>> >>>>> Mike
>> >>>>>
>> >>>>>
>> >>>>> On Sun, Nov 11, 2012 at 8:32 AM, Mohit Anchlia <
>> [EMAIL PROTECTED]>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> What we are seeing is that if flume gets killed either because of
>> >>>>>> server failure or other reasons, it keeps around the .tmp file.
>> Sometimes
>> >>>>>> for whatever reasons .tmp file is not readable. Is there a way to
>> rollover
>> >>>>>> .tmp file more gracefully?
>> >>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>>
>> >>
>> >
>>
>>
>>
>> --
>> Apache MRUnit - Unit testing MapReduce -
>> http://incubator.apache.org/mrunit/