Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Roll based on date


Copy link to this message
-
Re: Roll based on date
does the metrics endpoint show that events are still coming into this sink?

http://hostname of agent:41414/metrics <http://falcon:41414/metrics>

Also, can you post the rest of the config?
On Thu, Oct 24, 2013 at 10:09 PM, Martinus m <[EMAIL PROTECTED]> wrote:

> Hi David,
>
> Almost every few seconds.
>
> Thanks.
>
> Martinus
>
>
> On Thu, Oct 24, 2013 at 9:49 PM, David Sinclair <
> [EMAIL PROTECTED]> wrote:
>
>> How often are your events coming in?
>>
>>
>> On Thu, Oct 24, 2013 at 2:21 AM, Martinus m <[EMAIL PROTECTED]>wrote:
>>
>>> Hi David,
>>>
>>> Thanks for the example. I have set it just like above, but it only
>>> generate for the first 15 minutes. After waiting for more than one hour,
>>> there is no update at all in the s3 bucket.
>>>
>>> Thanks.
>>>
>>> Martinus
>>>
>>>
>>> On Wed, Oct 23, 2013 at 8:48 PM, David Sinclair <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> You can set all of the time/size based rolling policies to zero and set
>>>> an idle timeout on the sink. Below has a 15 minute timeout
>>>>
>>>> agent.sinks.sink.hdfs.fileSuffix = FlumeData.%Y-%m-%d
>>>> agent.sinks.sink.hdfs.fileType = DataStream
>>>> agent.sinks.sink.hdfs.rollInterval = 0
>>>> agent.sinks.sink.hdfs.rollSize = 0
>>>> agent.sinks.sink.hdfs.batchSize = 0
>>>> agent.sinks.sink.hdfs.rollCount = 0
>>>> agent.sinks.sink.hdfs.idleTimeout = 900
>>>>
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 10:17 PM, Martinus m <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> Hi David,
>>>>>
>>>>> The requirement is only roll per day actually.
>>>>>
>>>>> Hi Devin,
>>>>>
>>>>> Thanks for sharing your experienced. I also tried to set the config as
>>>>> following :
>>>>>
>>>>> agent.sinks.sink.hdfs.fileSuffix = FlumeData.%Y-%m-%d
>>>>> agent.sinks.sink.hdfs.fileType = DataStream
>>>>> agent.sinks.sink.hdfs.rollInterval = 0
>>>>> agent.sinks.sink.hdfs.rollSize = 0
>>>>> agent.sinks.sink.hdfs.batchSize = 15000
>>>>> agent.sinks.sink.hdfs.rollCount = 0
>>>>>
>>>>> But I didn't see anything on the s3 bucket. So I guess, I need to
>>>>> change the rollInterval into 86400. In my understanding, rollInterval 86400
>>>>> will roll the file after 24 hours like you said, but it will not generate
>>>>> new file if it's changed the day and haven't been 24 hours interval (unless
>>>>> we put prefix to fileSuffix as above).
>>>>>
>>>>> Thanks to both of you.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Martinus
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 11:16 PM, DSuiter RDX <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> Martinus, you have to set all the other roll options to 0 explicitly
>>>>>> in the configuration if you want them only to roll on one parameter, it
>>>>>> will take the shortest working parameter it can meet for the roll. If you
>>>>>> want it to roll once a day, you will have to specifically disable all the
>>>>>> other options for roll triggers - they all take default settings unless
>>>>>> told not to. When I was experimenting, for example, it kept rolling in 30
>>>>>> seconds even though I had the hdfs.rollSize set to 64MB (our test data is
>>>>>> generated slowly). So I ended up with a pile of small (0.2KB - 19~KB) files
>>>>>> in a bunch of directories sorted by timestamp in ten-minute intervals.
>>>>>>
>>>>>> So, maybe a conf like this:
>>>>>>
>>>>>> agent.sinks.sink.type = hdfs
>>>>>> agent.sinks.sink.channel = channel
>>>>>> agent.sinks.sink.hdfs.path = (desired path string, yours looks fine)
>>>>>> agent.sinks.sink.hdfs.fileSuffix = .avro
>>>>>> agent.sinks.sink.serializer = avro_event
>>>>>> agent.sinks.sink.hdfs.fileType = DataStream
>>>>>> agent.sinks.sink.hdfs.rollInterval = 86400
>>>>>> agent.sinks.sink.hdfs.rollSize = 134217728
>>>>>> agent.sinks.sink.hdfs.batchSize = 15000
>>>>>> agent.sinks.sink.hdfs.rollCount = 0
>>>>>>
>>>>>> This one will roll in HDFS in 24-hour intervals, or at 128MB file
>>>>>> size for the file, and will close the file if it has 15000 events in it,
>>>>>> but if the hdfs.rollCount line was not set to "0" or some higher value (I