Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - .tmp in hdfs sink


Copy link to this message
-
Re: .tmp in hdfs sink
Mohit Anchlia 2012-11-29, 16:46
Thanks for your response so far. I checkedout flume-1.3.0 and have built
it. My next question is the property hdfs.closeIdleTimeout correct? Do I
need to set any other property? My current config looks like and I write by
YYYY/MM/DD/HH format so essentially I get 1-2 files per hour.
webanalytics.sinks.hdfsSink.hdfs.filePrefix = web

webanalytics.sinks.hdfsSink.hdfs.rollInterval = 4000

webanalytics.sinks.hdfsSink.hdfs.rollCount = 20000000

#webanalytics.sinks.hdfsSink.hdfs.rollCount = 40000

webanalytics.sinks.hdfsSink.hdfs.rollSize = 15000000000

webanalytics.sinks.hdfsSink.hdfs.fileType = SequenceFile

webanalytics.sinks.hdfsSink.hdfs.writeFormat = Text

webanalytics.sinks.hdfsSink.hdfs.codeC = snappy
On Wed, Nov 28, 2012 at 9:20 PM, Juhani Connolly <
[EMAIL PROTECTED]> wrote:

>  The changes are in both the 1.3 RC5 and in the 1.4 trunk
>
>
> On 11/29/2012 01:26 PM, Mohit Anchlia wrote:
>
> If I grab the last snapshot would I get these changes?
>
> On Tue, Nov 20, 2012 at 3:24 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
>
>> that's awesome!
>>
>>
>> On Tue, Nov 20, 2012 at 3:11 PM, Mike Percy <[EMAIL PROTECTED]> wrote:
>>
>>> Mohit,
>>> No problem, but Juhani did all the work. :)
>>>
>>> The behavior is that you can configure an HDFS sink to close a file if
>>> it hasn't gotten any writes in some time. After it's been idle for 5
>>> minutes or something, it gets closed. If you get a "late" event that goes
>>> to the same path after the file is closed, it will just create a new file
>>> in the same path as usual.
>>>
>>> Regards,
>>> Mike
>>>
>>>
>>> On Tue, Nov 20, 2012 at 12:56 PM, Brock Noland <[EMAIL PROTECTED]>wrote:
>>>
>>>> We are currently voting on a 1.3.0 RC on the dev@ list:
>>>>
>>>> http://s.apache.org/OQ0W
>>>>
>>>> You don't have to be a committer to vote! :)
>>>>
>>>> Brock
>>>>
>>>> On Tue, Nov 20, 2012 at 2:53 PM, Mohit Anchlia <[EMAIL PROTECTED]>
>>>> wrote:
>>>> > Thanks a lot!! Now with this what should be the expected behaviour?
>>>> After
>>>> > file is closed a new file is created for writes that come after
>>>> closing the
>>>> > file?
>>>> >
>>>> > Thanks again for committing this change. Do you know when 1.3.0 is
>>>> out? I am
>>>> > currently using the snapshot version of 1.3.0
>>>> >
>>>> > On Tue, Nov 20, 2012 at 11:16 AM, Mike Percy <[EMAIL PROTECTED]>
>>>> wrote:
>>>> >>
>>>> >> Mohit,
>>>> >> FLUME-1660 is now committed and it will be in 1.3.0. In the case
>>>> where you
>>>> >> are using 1.2.0, I suggest running with hdfs.rollInterval set so the
>>>> files
>>>> >> will roll normally.
>>>> >>
>>>> >> Regards,
>>>> >> Mike
>>>> >>
>>>> >>
>>>> >> On Thu, Nov 15, 2012 at 11:23 PM, Juhani Connolly
>>>> >> <[EMAIL PROTECTED]> wrote:
>>>> >>>
>>>> >>> I am actually working on a patch for exactly this, refer to
>>>> FLUME-1660
>>>> >>>
>>>> >>> The patch is on review board right now, I fixed a corner case issue
>>>> that
>>>> >>> came up with unit testing, but the implementation is not really to
>>>> my
>>>> >>> satisfaction. If you are interested please have a look and add your
>>>> opinion.
>>>> >>>
>>>> >>> https://issues.apache.org/jira/browse/FLUME-1660
>>>> >>> https://reviews.apache.org/r/7659/
>>>> >>>
>>>> >>>
>>>> >>> On 11/16/2012 01:16 PM, Mohit Anchlia wrote:
>>>> >>>
>>>> >>> Another question I had was about rollover. What's the best way to
>>>> >>> rollover files in reasonable timeframe? For instance our path is
>>>> YY/MM/DD/HH
>>>> >>> so every hour there is new file and the -1 hr is just sitting with
>>>> .tmp and
>>>> >>> it takes sometimes even hour before .tmp is closed and renamed to
>>>> .snappy.
>>>> >>> In this situation is there a way to tell flume to rollover files
>>>> sooner
>>>> >>> based on some idle time limit?
>>>> >>>
>>>> >>> On Thu, Nov 15, 2012 at 8:14 PM, Mohit Anchlia <
>>>> [EMAIL PROTECTED]>
>>>> >>> wrote:
>>>> >>>>
>>>> >>>> Thanks Mike it makes sense. Anyway I can help?
>>>> >>>>
>>>> >>>>
>>>> >>>> On Thu, Nov 15, 2012 at 11:54 AM, Mike Percy <[EMAIL PROTECTED]>