Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> AW: Problems with time variables in HDFS path


Copy link to this message
-
AW: Problems with time variables in HDFS path
Hey Alex,

i used the logger command to generate a syslog message and rsyslogd to send it. I did this to prevent any malformed message. rsyslogd talks RFC 3164 by default. If I use rsyslogd to receive and store the message all information are fine, so the message itself should be correct.

Running your command breaks the cdh4.0.1 flume-ng version. And as far as I can see from your pasted output, it is broken in your version too. The host is filled with "a", but should be "host".

Also I tried to write into a logger sink, this doesn't break flume, but explains the problem a bit more. Just writing to an HDFS sink breaks it (if you use %Y and so on inside the path).

echo "<13>Jun 20 12:12:12 host foo[345]: a syslog message with" > /tmp/foo; nc -v aHostname 5140 < /tmp/foo
2012-07-11 16:42:58,779 INFO sink.LoggerSink: Event: { headers:{timestamp=1340187132000, Severity=5, host=host, Facility=8} body: 66 6F 6F 5B 33 34 35 5D 3A 20 61 20 73 79 73 6C foo[345]: a sysl }

As you see, everything is fine. Timestamp is set, host is filled correctly and the HDFS sink would be able to process this message.

echo "<13>Jun 20 12:12:12 host foo[345]: - a syslog message with" > /tmp/foo; nc -v aHostname 5140 < /tmp/foo
2012-07-11 16:42:34,006 INFO sink.LoggerSink: Event: { headers:{Severity=5, host=a, Facility=8} body: 73 79 73 6C 6F 67 20 6D 65 73 73 61 67 65 20 77 syslog message w }

This one is broken, host is "a" and no timestamp :)

Best regards,
Chris

-----Urspr√ľngliche Nachricht-----
Von: alo alt [mailto:[EMAIL PROTECTED]]
Gesendet: Mittwoch, 11. Juli 2012 14:54
An: [EMAIL PROTECTED]
Betreff: Re: Problems with time variables in HDFS path

Chris,

syslog is a RFC defined protocol, we support only RFC-5424 and RFC-3164 formats. Since you've to use valid syslog events it works:

echo "<13>Jun 20 12:12:12 host foo[345]: - a syslog message with -" > /tmp/foo nc -v YOUR_IP 5140 < /tmp/foo

12/07/11 14:51:52 INFO sink.LoggerSink: Event: { headers:{Severity=5, host=a, Facility=8} body: 73 79 73 6C 6F 67 20 6D 65 73 73 61 67 65 20 77 syslog message w }
- Alex

p.s. ich hab Deine Mail an mich nicht gesehen, aber es ist besser an die liste zu schreiben ;)

On Jul 11, 2012, at 12:15 PM, Juhani Connolly wrote:

> The time variables depend on the existence of a header with the key "timestamp". If it isn't there, it tries to parse a non-existent header to calculate the time, and this happens. I don't believe it has anything to do with the contents of your log message.
>
> For the easiest way to add the header, I would recommend trying 1.2.0
> as soon as it is released(or you can try grabbing the current release
> candidate or even the 1.3.0 trunk which I'm running right now without
> any serious issues), and using the TimestampInterceptor there. As this
> is a frequent query I've made a jira to document this dependency
> properly https://issues.apache.org/jira/browse/FLUME-1364
>
> On 07/11/2012 06:41 PM, Christian Schroer wrote:
>> Hi,
>>
>> we are running into a strange problem using Flume-NG 1.10 from CDH 4.0.1.
>>
>> Setup:
>> Flume-NG opens a TCP syslog port, collects all messages and forwards them directly into HDFS. This works fine until the point where we want to forward MS IIS Logs in W3C format. The reason seems to be a " - " inside the log message. I could reproduce the problem using rsyslogd forwarding all syslog messages to flume:
>>
>> logger "Hello this is a test" => Works fine :)
>>
>> logger "hello - this will break" => breaks flume :(
>>
>> If I remove the time variables from the HDFS path in our configuration (attached) everything is working fine...
>>
>> Exception:
>>
>> 2012-07-11 11:08:18,292 ERROR hdfs.HDFSEventSink: process failed
>> java.lang.NumberFormatException: null
>>         at java.lang.Long.parseLong(Long.java:375)
>>         at java.lang.Long.valueOf(Long.java:525)
>>         at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:220)
>>         at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:310)
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB