Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Roll based on date


Copy link to this message
-
Re: Roll based on date
does the metrics endpoint show that events are still coming into this sink?

http://hostname of agent:41414/metrics <http://falcon:41414/metrics>

Also, can you post the rest of the config?
On Thu, Oct 24, 2013 at 10:09 PM, Martinus m <[EMAIL PROTECTED]> wrote:

> Hi David,
>
> Almost every few seconds.
>
> Thanks.
>
> Martinus
>
>
> On Thu, Oct 24, 2013 at 9:49 PM, David Sinclair <
> [EMAIL PROTECTED]> wrote:
>
>> How often are your events coming in?
>>
>>
>> On Thu, Oct 24, 2013 at 2:21 AM, Martinus m <[EMAIL PROTECTED]>wrote:
>>
>>> Hi David,
>>>
>>> Thanks for the example. I have set it just like above, but it only
>>> generate for the first 15 minutes. After waiting for more than one hour,
>>> there is no update at all in the s3 bucket.
>>>
>>> Thanks.
>>>
>>> Martinus
>>>
>>>
>>> On Wed, Oct 23, 2013 at 8:48 PM, David Sinclair <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> You can set all of the time/size based rolling policies to zero and set
>>>> an idle timeout on the sink. Below has a 15 minute timeout
>>>>
>>>> agent.sinks.sink.hdfs.fileSuffix = FlumeData.%Y-%m-%d
>>>> agent.sinks.sink.hdfs.fileType = DataStream
>>>> agent.sinks.sink.hdfs.rollInterval = 0
>>>> agent.sinks.sink.hdfs.rollSize = 0
>>>> agent.sinks.sink.hdfs.batchSize = 0
>>>> agent.sinks.sink.hdfs.rollCount = 0
>>>> agent.sinks.sink.hdfs.idleTimeout = 900
>>>>
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 10:17 PM, Martinus m <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> Hi David,
>>>>>
>>>>> The requirement is only roll per day actually.
>>>>>
>>>>> Hi Devin,
>>>>>
>>>>> Thanks for sharing your experienced. I also tried to set the config as
>>>>> following :
>>>>>
>>>>> agent.sinks.sink.hdfs.fileSuffix = FlumeData.%Y-%m-%d
>>>>> agent.sinks.sink.hdfs.fileType = DataStream
>>>>> agent.sinks.sink.hdfs.rollInterval = 0
>>>>> agent.sinks.sink.hdfs.rollSize = 0
>>>>> agent.sinks.sink.hdfs.batchSize = 15000
>>>>> agent.sinks.sink.hdfs.rollCount = 0
>>>>>
>>>>> But I didn't see anything on the s3 bucket. So I guess, I need to
>>>>> change the rollInterval into 86400. In my understanding, rollInterval 86400
>>>>> will roll the file after 24 hours like you said, but it will not generate
>>>>> new file if it's changed the day and haven't been 24 hours interval (unless
>>>>> we put prefix to fileSuffix as above).
>>>>>
>>>>> Thanks to both of you.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Martinus
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 11:16 PM, DSuiter RDX <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> Martinus, you have to set all the other roll options to 0 explicitly
>>>>>> in the configuration if you want them only to roll on one parameter, it
>>>>>> will take the shortest working parameter it can meet for the roll. If you
>>>>>> want it to roll once a day, you will have to specifically disable all the
>>>>>> other options for roll triggers - they all take default settings unless
>>>>>> told not to. When I was experimenting, for example, it kept rolling in 30
>>>>>> seconds even though I had the hdfs.rollSize set to 64MB (our test data is
>>>>>> generated slowly). So I ended up with a pile of small (0.2KB - 19~KB) files
>>>>>> in a bunch of directories sorted by timestamp in ten-minute intervals.
>>>>>>
>>>>>> So, maybe a conf like this:
>>>>>>
>>>>>> agent.sinks.sink.type = hdfs
>>>>>> agent.sinks.sink.channel = channel
>>>>>> agent.sinks.sink.hdfs.path = (desired path string, yours looks fine)
>>>>>> agent.sinks.sink.hdfs.fileSuffix = .avro
>>>>>> agent.sinks.sink.serializer = avro_event
>>>>>> agent.sinks.sink.hdfs.fileType = DataStream
>>>>>> agent.sinks.sink.hdfs.rollInterval = 86400
>>>>>> agent.sinks.sink.hdfs.rollSize = 134217728
>>>>>> agent.sinks.sink.hdfs.batchSize = 15000
>>>>>> agent.sinks.sink.hdfs.rollCount = 0
>>>>>>
>>>>>> This one will roll in HDFS in 24-hour intervals, or at 128MB file
>>>>>> size for the file, and will close the file if it has 15000 events in it,
>>>>>> but if the hdfs.rollCount line was not set to "0" or some higher value (I
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB