Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> YAMLException in the elasticsearch sink


+
Allan Feid 2013-06-10, 13:59
+
Edward Sargisson 2013-06-11, 16:33
+
Allan Feid 2013-06-12, 14:13
+
Edward Sargisson 2013-06-13, 16:13
+
Allan Feid 2013-06-13, 17:11
+
Allan Feid 2013-06-13, 19:33
+
Edward Sargisson 2013-06-12, 16:14
Copy link to this message
-
Re: YAMLException in the elasticsearch sink
Edward,

Flogger is available here:
https://github.com/cloudera/flume/tree/master/contrib/flogger

I've forked it to accept multiple -t args, but it basically uses the legacy
thrift/rpc protocol to add events from STDIN. Both the file_roll and HDFS
sinks do not run into UTF-8 errors. The architecture is basically tail |
flogger -> local flume instance -> log processing flume instance -> { hdfs,
file_roll, elasticsearch }. I can send specific configs if necessary, but
it's all pretty standard as per the User Guide.

Thanks,
Allan
On Wed, Jun 12, 2013 at 12:14 PM, Edward Sargisson <[EMAIL PROTECTED]> wrote:

> Hi Allan,
> I think I would run it a debugger and look at the buffer that way. You
> should be able to put
>
>
> JAVA_OPTS="-agentlib:jdwp=transport=dt_socket,address=localhost:9009,server=y,suspend=n"
>
> into your /etc/flume-ng/conf/flume-env.conf and then attach a debugger
> with a break point on org.apache.flume.sink.
> elasticsearch.ElasticSearchSink.process(ElasticSearchSink.java:178)
>
> You could try the file_roll sink but I'm not sure if it won't munge the
> character sets itself.
>
> Can you send me a link to flogger and your configuration for it? I'm not
> familiar with it.
>
> Cheers,
> Edward
>
> "
> Edward,
>
> Thanks for the reply. I'm not encoding my events in any specific character
> set. I'm using flogger to send application logs (nodejs, ruby, perl etc)
> into my flume infrastructure. It seems that only the ElasticSearchSink
> encounters this issue. I'm not sure if the HDFS or file roll sinks are
> forcing an encoding before trying to process (haven't checked the code
> yet). Is there an easy way to have flume output the hex data of an event?
> I'd love to provide the hex alongside the exception.
>
> Thanks,
> Allan
>
>
> On Tue, Jun 11, 2013 at 12:33 PM, Edward Sargisson <[EMAIL PROTECTED]>wrote:
>
>> Hi Allan,
>> I would like to see the contents of the event you are trying to store -
>> in hex - paired with the exception that relates to that message.
>> This, "Invalid UTF-8 start byte 0xfc (at char #81, byte #-1)" indicates
>> that that there is a problem with the data and the character sets. In other
>> words, are you encoding your data to be sent to Flume in UTF-8 or something
>> else?
>>
>> Cheers,
>> Edward
>>
>>
>> "
>> I think this might have to deal specifically with the LogStash
>> serializer, but I am unsure. After a period of time, it seems some of my
>> events cause an exception and eventually fill up my memory channel. Below
>> is the stacktrace, any help would be greatly appreciated. I can file a bug
>> report but would like to know what kind of information to provide.
>>
>> 10 Jun 2013 09:52:34,360 ERROR
>> [SinkRunner-PollingRunner-DefaultSinkProcessor]
>> (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver
>> event. Exception follows.
>> org.elasticsearch.common.jackson.dataformat.yaml.snakeyaml.error.YAMLException:
>> java.io.CharConversionException: Invalid UTF-8 start byte 0xfc (at char
>> #81, byte #-1)
>>  at
>> org.elasticsearch.common.jackson.dataformat.yaml.snakeyaml.reader.StreamReader.update(StreamReader.java:198)
>> at
>> org.elasticsearch.common.jackson.dataformat.yaml.snakeyaml.reader.StreamReader.<init>(StreamReader.java:62)
>>  at
>> org.elasticsearch.common.jackson.dataformat.yaml.YAMLParser.<init>(YAMLParser.java:147)
>> at
>> org.elasticsearch.common.jackson.dataformat.yaml.YAMLFactory._createParser(YAMLFactory.java:530)
>>  at
>> org.elasticsearch.common.jackson.dataformat.yaml.YAMLFactory.createJsonParser(YAMLFactory.java:420)
>> at
>> org.elasticsearch.common.xcontent.yaml.YamlXContent.createParser(YamlXContent.java:83)
>>  at
>> org.apache.flume.sink.elasticsearch.ContentBuilderUtil.addComplexField(ContentBuilderUtil.java:61)
>> at
>> org.apache.flume.sink.elasticsearch.ContentBuilderUtil.appendField(ContentBuilderUtil.java:47)
>>  at
>> org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer.appendBody(ElasticSearchLogStashEventSerializer.java:87)
+
Edward Sargisson 2013-06-14, 16:06
+
Allan Feid 2013-06-14, 16:13
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB