Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Roll based on date


Copy link to this message
-
Re: Roll based on date
Hi David,

Following is my configuration file :

agent.sources = seqGenSrc
agent.channels = fileChannel
agent.sinks = s3Sink

# For each one of the sources, the type is defined
agent.sources.seqGenSrc.type = syslogtcp
agent.sources.seqGenSrc.port = 5140
agent.sources.seqGenSrc.host = localhost
agent.sources.seqGenSrc.keepFields = true

# The channel can be defined as follows.
agent.sources.seqGenSrc.channels = fileChannel

# Each sink's type must be defined
agent.sinks.s3Sink.type = hdfs

#Specify the channel the sink should use
agent.sinks.s3Sink.channel = fileChannel
agent.sinks.s3Sink.hdfs.path = s3n://awskeyid:awssecretkey@bucket_name
/%{host}
agent.sinks.s3Sink.hdfs.filePrefix = FlumeData.%Y-%m-%d
agent.sinks.s3Sink.hdfs.rollInterval = 0
agent.sinks.s3Sink.hdfs.rollSize = 0
agent.sinks.s3Sink.hdfs.rollCount = 0
agent.sinks.s3Sink.hdfs.batchSize = 0
agent.sinks.s3Sink.hdfs.idleTimeout = 600
agent.sinks.s3Sink.hdfs.fileType = DataStream

# Each channel's type is defined.
agent.channels.fileChannel.type = file

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent.channels.fileChannel.capacity = 1000000

Thanks.

Martinus
On Fri, Oct 25, 2013 at 10:20 PM, David Sinclair <
[EMAIL PROTECTED]> wrote:

> does the metrics endpoint show that events are still coming into this sink?
>
> http://hostname of agent:41414/metrics <http://falcon:41414/metrics>
>
> Also, can you post the rest of the config?
>
>
> On Thu, Oct 24, 2013 at 10:09 PM, Martinus m <[EMAIL PROTECTED]>wrote:
>
>> Hi David,
>>
>> Almost every few seconds.
>>
>> Thanks.
>>
>> Martinus
>>
>>
>> On Thu, Oct 24, 2013 at 9:49 PM, David Sinclair <
>> [EMAIL PROTECTED]> wrote:
>>
>>> How often are your events coming in?
>>>
>>>
>>> On Thu, Oct 24, 2013 at 2:21 AM, Martinus m <[EMAIL PROTECTED]>wrote:
>>>
>>>> Hi David,
>>>>
>>>> Thanks for the example. I have set it just like above, but it only
>>>> generate for the first 15 minutes. After waiting for more than one hour,
>>>> there is no update at all in the s3 bucket.
>>>>
>>>> Thanks.
>>>>
>>>> Martinus
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 8:48 PM, David Sinclair <
>>>> [EMAIL PROTECTED]> wrote:
>>>>
>>>>> You can set all of the time/size based rolling policies to zero and
>>>>> set an idle timeout on the sink. Below has a 15 minute timeout
>>>>>
>>>>> agent.sinks.sink.hdfs.fileSuffix = FlumeData.%Y-%m-%d
>>>>> agent.sinks.sink.hdfs.fileType = DataStream
>>>>> agent.sinks.sink.hdfs.rollInterval = 0
>>>>> agent.sinks.sink.hdfs.rollSize = 0
>>>>> agent.sinks.sink.hdfs.batchSize = 0
>>>>> agent.sinks.sink.hdfs.rollCount = 0
>>>>> agent.sinks.sink.hdfs.idleTimeout = 900
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 10:17 PM, Martinus m <[EMAIL PROTECTED]>wrote:
>>>>>
>>>>>> Hi David,
>>>>>>
>>>>>> The requirement is only roll per day actually.
>>>>>>
>>>>>> Hi Devin,
>>>>>>
>>>>>> Thanks for sharing your experienced. I also tried to set the config
>>>>>> as following :
>>>>>>
>>>>>> agent.sinks.sink.hdfs.fileSuffix = FlumeData.%Y-%m-%d
>>>>>> agent.sinks.sink.hdfs.fileType = DataStream
>>>>>> agent.sinks.sink.hdfs.rollInterval = 0
>>>>>> agent.sinks.sink.hdfs.rollSize = 0
>>>>>> agent.sinks.sink.hdfs.batchSize = 15000
>>>>>> agent.sinks.sink.hdfs.rollCount = 0
>>>>>>
>>>>>> But I didn't see anything on the s3 bucket. So I guess, I need to
>>>>>> change the rollInterval into 86400. In my understanding, rollInterval 86400
>>>>>> will roll the file after 24 hours like you said, but it will not generate
>>>>>> new file if it's changed the day and haven't been 24 hours interval (unless
>>>>>> we put prefix to fileSuffix as above).
>>>>>>
>>>>>> Thanks to both of you.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Martinus
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 22, 2013 at 11:16 PM, DSuiter RDX <[EMAIL PROTECTED]>wrote:
>>>>>>
>>>>>>> Martinus, you have to set all the other roll options to 0 explicitly
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB