Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - High level technical overview of output bucketing for flume (old-gen) ?


+
Bertrand Dechoux 2013-01-04, 14:22
+
Alexander Alten-Lorenz 2013-01-04, 15:06
Copy link to this message
-
Re: High level technical overview of output bucketing for flume (old-gen) ?
Bertrand Dechoux 2013-01-04, 16:37
Thank you but I am afraid I wasn't clear enough.
I have no issue with the configuration and I understand output bucketing.

However, the flume old-gen syslog source do not use the syslog timestamp as
far I understand it from the source. (It only cares about the priority
which is not really a bad decision in itself because that way the
implementation is 'compatible' with both BSD and IETF syslog standards.) I
wrote a sink decorator in order to change that. It reads the syslog header,
uses the syslog timestamp (which is really the time when the log was
generated) and adds a few metadata.
But I have not a full understanding of flume source.

Could anyone point me to where the 'sequences date and times'* *are
interpreted (in flume source ; ie which classes)?

Thanks in advance

Bertrand
On Fri, Jan 4, 2013 at 4:06 PM, Alexander Alten-Lorenz
<[EMAIL PROTECTED]>wrote:

> Hi Bertrand,
>
> I have written a blog about in 2011, here you can see for what you can see
> the use of bucketing:
>
> http://mapredit.blogspot.de/2011/10/centralized-logfile-management-across.html
>
> You can use the sequences to create directories, based on the sequences
> the timestamp on a syslog event will be delivered. So you have the
> availability to automatically create directories for year, month, day, hour
> or something like that.
>
> Best,
>  Alex
>
> On Jan 4, 2013, at 3:22 PM, Bertrand Dechoux <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > I am using flume (old gen) as an extension to an existant syslog system
> and
> > would like to use the timestamp of the syslog message as the timestamp of
> > the flume event.
> > I guess the timestamp is used for the '*Fine grained escape sequences
> date
> > and times*' but I don't have a clear understanding of it.
> > http://archive.cloudera.com/cdh/3/flume/UserGuide/#_output_bucketing
> >
> > Could someone point me to where those sequences (like %d) are
> interpreted?
> > I would like to be sure I am not missing anything obvious.
> >
> > Thanks in advance
> >
> > Bertrand
> >
> > PS : I know an unrelated recommandation would be to use flume-ng but this
> > is not the topic of this email.
>
> --
> Alexander Alten-Lorenz
> http://mapredit.blogspot.com
> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>
>
--
Bertrand Dechoux