Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Roll based on date


Copy link to this message
-
Re: Roll based on date
Hi David,

Thanks for your answer. I already did that, but using %Y-%m-%d. But, since
there are still roll based on Size, so it will keep generating two or mores
FlumeData.%Y-%m-%d with different postfix.

Thanks.

Martinus
On Fri, Oct 18, 2013 at 10:35 PM, David Sinclair <
[EMAIL PROTECTED]> wrote:

> The SyslogTcpSource will put a header on the flume event named
> 'timestamp'. This timestamp will be from the syslog entry. You could then
> set the filePrefix in the sink to grab this out.
> For example
>
> tier1.sinks.hdfsSink.hdfs.filePrefix = FlumeData.%{timestamp}
>
> dave
>
>
> On Thu, Oct 17, 2013 at 10:23 PM, Martinus m <[EMAIL PROTECTED]>wrote:
>
>> Hi David,
>>
>> It's syslogtcp.
>>
>> Thanks.
>>
>> Martinus
>>
>>
>> On Thu, Oct 17, 2013 at 9:17 PM, David Sinclair <
>> [EMAIL PROTECTED]> wrote:
>>
>>> What type of source are you using?
>>>
>>>
>>> On Wed, Oct 16, 2013 at 9:56 PM, Martinus m <[EMAIL PROTECTED]>wrote:
>>>
>>>> Hi,
>>>>
>>>> Is there any option in HDFS sink that I can start rolling a new file
>>>> whenever the date in the log change? For example, I got below logs :
>>>>
>>>> Oct 16 23:58:56 test-host : just test
>>>> Oct 16 23:59:51 test-host : test again
>>>> Oct 17 00:00:56 test-host : just test
>>>> Oct 17 00:00:56 test-host : test again
>>>>
>>>> Then I want it to make a file on S3 bucket with result like this :
>>>>
>>>> FlumeData.2013-10-16.1381916293017 <-- all the logs with Oct 16 from
>>>> this year 2013 will goes to here and when it's reach Oct 17 year 2013, then
>>>> it will start to sink into a new file below :
>>>>
>>>> FlumeData.2013-10-17.1381940047117
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>
>