Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - Preserving origin syslog information


+
DSuiter RDX 2013-10-30, 15:11
Copy link to this message
-
Re: Preserving origin syslog information
Jeff Lord 2013-10-31, 23:47
Devin,

FLUME-1666 added a keepFields property that will allow you to preserve the
timestamp and hostname in the body of the generated flume event.
That patch was committed to trunk a couple of weeks ago so if you use trunk
to build it should be available.
https://issues.apache.org/jira/browse/FLUME-1666

Please note that this still does not preserve the priority.
I will be submitting another patch this evening which will do just that for
the syslogTCP, syslogUDP, and syslogMultiPort sources.

Best,

Jeff
On Wed, Oct 30, 2013 at 8:11 AM, DSuiter RDX <[EMAIL PROTECTED]> wrote:

> Hi, just a general behavioral question.
>
> We have a syslogTCP source catching remotely generated syslog events. They
> got to an Avro sink, which delivers them to an Avro source, then into an
> HDFS sink.
>
> I currently have a test replicating channel delivering it to HDFS with the
> avro_event serializer, and also delivering the same events to HDFS without
> the avro_event serializer. The latter results in a text-encoded aggregate
> file, which works well.
>
> The issue I would like clarification on is this:
>
> When it is saved to HDFS as Avro, there is a epoch timestamp, the
> hostname, and some severity and facility information being saved along with
> the message body. There is a "headers" and "body" section of the Avro
> schema, and the timestamp etc is in the "headers" section, and the actual
> text is the "body."
>
> However, when the file is saved to HDFS as text, the only thing we get is
> the content of the "body" field, and there is no longer any host,
> timestamp, etc., even though those are components of the original message.
>
> Where are the components form the generating server being stripped away?
> By syslogTCP source, or by HDFS sink deserializing into text?
>
> Another way to summarize this is: When the server writing the events to
> syslog writes them, it writes with timestamp and host fields. If we use
> Avro the whole way, it keeps that information as headers, but if we save as
> text, no timestamp or host information is preserved. We would like it
> preserved so we can programmatically parse the timestamp to sort by day. We
> would also like to not have to deal with Avro MapReduce for the time being,
> as that has proved challenging. So, is there a way that I can get the WHOLE
> event body as the "body" using syslogTCP source, or do we need to look at
> exec source to tail the generating server /var/log/messages and send it
> that way?
>
> Thanks,
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
>