Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> .tmp in hdfs sink


Copy link to this message
-
Re: .tmp in hdfs sink
Thanks for your response so far. I checkedout flume-1.3.0 and have built
it. My next question is the property hdfs.closeIdleTimeout correct? Do I
need to set any other property? My current config looks like and I write by
YYYY/MM/DD/HH format so essentially I get 1-2 files per hour.
webanalytics.sinks.hdfsSink.hdfs.filePrefix = web

webanalytics.sinks.hdfsSink.hdfs.rollInterval = 4000

webanalytics.sinks.hdfsSink.hdfs.rollCount = 20000000

#webanalytics.sinks.hdfsSink.hdfs.rollCount = 40000

webanalytics.sinks.hdfsSink.hdfs.rollSize = 15000000000

webanalytics.sinks.hdfsSink.hdfs.fileType = SequenceFile

webanalytics.sinks.hdfsSink.hdfs.writeFormat = Text

webanalytics.sinks.hdfsSink.hdfs.codeC = snappy
On Wed, Nov 28, 2012 at 9:20 PM, Juhani Connolly <
[EMAIL PROTECTED]> wrote:

>  The changes are in both the 1.3 RC5 and in the 1.4 trunk
>
>
> On 11/29/2012 01:26 PM, Mohit Anchlia wrote:
>
> If I grab the last snapshot would I get these changes?
>
> On Tue, Nov 20, 2012 at 3:24 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
>
>> that's awesome!
>>
>>
>> On Tue, Nov 20, 2012 at 3:11 PM, Mike Percy <[EMAIL PROTECTED]> wrote:
>>
>>> Mohit,
>>> No problem, but Juhani did all the work. :)
>>>
>>> The behavior is that you can configure an HDFS sink to close a file if
>>> it hasn't gotten any writes in some time. After it's been idle for 5
>>> minutes or something, it gets closed. If you get a "late" event that goes
>>> to the same path after the file is closed, it will just create a new file
>>> in the same path as usual.
>>>
>>> Regards,
>>> Mike
>>>
>>>
>>> On Tue, Nov 20, 2012 at 12:56 PM, Brock Noland <[EMAIL PROTECTED]>wrote:
>>>
>>>> We are currently voting on a 1.3.0 RC on the dev@ list:
>>>>
>>>> http://s.apache.org/OQ0W
>>>>
>>>> You don't have to be a committer to vote! :)
>>>>
>>>> Brock
>>>>
>>>> On Tue, Nov 20, 2012 at 2:53 PM, Mohit Anchlia <[EMAIL PROTECTED]>
>>>> wrote:
>>>> > Thanks a lot!! Now with this what should be the expected behaviour?
>>>> After
>>>> > file is closed a new file is created for writes that come after
>>>> closing the
>>>> > file?
>>>> >
>>>> > Thanks again for committing this change. Do you know when 1.3.0 is
>>>> out? I am
>>>> > currently using the snapshot version of 1.3.0
>>>> >
>>>> > On Tue, Nov 20, 2012 at 11:16 AM, Mike Percy <[EMAIL PROTECTED]>
>>>> wrote:
>>>> >>
>>>> >> Mohit,
>>>> >> FLUME-1660 is now committed and it will be in 1.3.0. In the case
>>>> where you
>>>> >> are using 1.2.0, I suggest running with hdfs.rollInterval set so the
>>>> files
>>>> >> will roll normally.
>>>> >>
>>>> >> Regards,
>>>> >> Mike
>>>> >>
>>>> >>
>>>> >> On Thu, Nov 15, 2012 at 11:23 PM, Juhani Connolly
>>>> >> <[EMAIL PROTECTED]> wrote:
>>>> >>>
>>>> >>> I am actually working on a patch for exactly this, refer to
>>>> FLUME-1660
>>>> >>>
>>>> >>> The patch is on review board right now, I fixed a corner case issue
>>>> that
>>>> >>> came up with unit testing, but the implementation is not really to
>>>> my
>>>> >>> satisfaction. If you are interested please have a look and add your
>>>> opinion.
>>>> >>>
>>>> >>> https://issues.apache.org/jira/browse/FLUME-1660
>>>> >>> https://reviews.apache.org/r/7659/
>>>> >>>
>>>> >>>
>>>> >>> On 11/16/2012 01:16 PM, Mohit Anchlia wrote:
>>>> >>>
>>>> >>> Another question I had was about rollover. What's the best way to
>>>> >>> rollover files in reasonable timeframe? For instance our path is
>>>> YY/MM/DD/HH
>>>> >>> so every hour there is new file and the -1 hr is just sitting with
>>>> .tmp and
>>>> >>> it takes sometimes even hour before .tmp is closed and renamed to
>>>> .snappy.
>>>> >>> In this situation is there a way to tell flume to rollover files
>>>> sooner
>>>> >>> based on some idle time limit?
>>>> >>>
>>>> >>> On Thu, Nov 15, 2012 at 8:14 PM, Mohit Anchlia <
>>>> [EMAIL PROTECTED]>
>>>> >>> wrote:
>>>> >>>>
>>>> >>>> Thanks Mike it makes sense. Anyway I can help?
>>>> >>>>
>>>> >>>>
>>>> >>>> On Thu, Nov 15, 2012 at 11:54 AM, Mike Percy <[EMAIL PROTECTED]>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB