Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> High level technical overview of output bucketing for flume (old-gen) ?


Copy link to this message
-
Re: High level technical overview of output bucketing for flume (old-gen) ?
Thank you but I am afraid I wasn't clear enough.
I have no issue with the configuration and I understand output bucketing.

However, the flume old-gen syslog source do not use the syslog timestamp as
far I understand it from the source. (It only cares about the priority
which is not really a bad decision in itself because that way the
implementation is 'compatible' with both BSD and IETF syslog standards.) I
wrote a sink decorator in order to change that. It reads the syslog header,
uses the syslog timestamp (which is really the time when the log was
generated) and adds a few metadata.
But I have not a full understanding of flume source.

Could anyone point me to where the 'sequences date and times'* *are
interpreted (in flume source ; ie which classes)?

Thanks in advance

Bertrand
On Fri, Jan 4, 2013 at 4:06 PM, Alexander Alten-Lorenz
<[EMAIL PROTECTED]>wrote:

> Hi Bertrand,
>
> I have written a blog about in 2011, here you can see for what you can see
> the use of bucketing:
>
> http://mapredit.blogspot.de/2011/10/centralized-logfile-management-across.html
>
> You can use the sequences to create directories, based on the sequences
> the timestamp on a syslog event will be delivered. So you have the
> availability to automatically create directories for year, month, day, hour
> or something like that.
>
> Best,
>  Alex
>
> On Jan 4, 2013, at 3:22 PM, Bertrand Dechoux <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > I am using flume (old gen) as an extension to an existant syslog system
> and
> > would like to use the timestamp of the syslog message as the timestamp of
> > the flume event.
> > I guess the timestamp is used for the '*Fine grained escape sequences
> date
> > and times*' but I don't have a clear understanding of it.
> > http://archive.cloudera.com/cdh/3/flume/UserGuide/#_output_bucketing
> >
> > Could someone point me to where those sequences (like %d) are
> interpreted?
> > I would like to be sure I am not missing anything obvious.
> >
> > Thanks in advance
> >
> > Bertrand
> >
> > PS : I know an unrelated recommandation would be to use flume-ng but this
> > is not the topic of this email.
>
> --
> Alexander Alten-Lorenz
> http://mapredit.blogspot.com
> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>
>
--
Bertrand Dechoux
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB