Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - YAMLException in the elasticsearch sink


Copy link to this message
-
Re: YAMLException in the elasticsearch sink
Edward Sargisson 2013-06-12, 16:14
Hi Allan,
I think I would run it a debugger and look at the buffer that way. You
should be able to put

JAVA_OPTS="-agentlib:jdwp=transport=dt_socket,address=localhost:9009,server=y,suspend=n"

into your /etc/flume-ng/conf/flume-env.conf and then attach a debugger with
a break point on org.apache.flume.sink.
elasticsearch.ElasticSearchSink.process(ElasticSearchSink.java:178)

You could try the file_roll sink but I'm not sure if it won't munge the
character sets itself.

Can you send me a link to flogger and your configuration for it? I'm not
familiar with it.

Cheers,
Edward

"
Edward,

Thanks for the reply. I'm not encoding my events in any specific character
set. I'm using flogger to send application logs (nodejs, ruby, perl etc)
into my flume infrastructure. It seems that only the ElasticSearchSink
encounters this issue. I'm not sure if the HDFS or file roll sinks are
forcing an encoding before trying to process (haven't checked the code
yet). Is there an easy way to have flume output the hex data of an event?
I'd love to provide the hex alongside the exception.

Thanks,
Allan
On Tue, Jun 11, 2013 at 12:33 PM, Edward Sargisson <[EMAIL PROTECTED]> wrote:

> Hi Allan,
> I would like to see the contents of the event you are trying to store - in
> hex - paired with the exception that relates to that message.
> This, "Invalid UTF-8 start byte 0xfc (at char #81, byte #-1)" indicates
> that that there is a problem with the data and the character sets. In other
> words, are you encoding your data to be sent to Flume in UTF-8 or something
> else?
>
> Cheers,
> Edward
>
>
> "
> I think this might have to deal specifically with the LogStash serializer,
> but I am unsure. After a period of time, it seems some of my events cause
> an exception and eventually fill up my memory channel. Below is the
> stacktrace, any help would be greatly appreciated. I can file a bug report
> but would like to know what kind of information to provide.
>
> 10 Jun 2013 09:52:34,360 ERROR
> [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver
> event. Exception follows.
> org.elasticsearch.common.jackson.dataformat.yaml.snakeyaml.error.YAMLException:
> java.io.CharConversionException: Invalid UTF-8 start byte 0xfc (at char
> #81, byte #-1)
>  at
> org.elasticsearch.common.jackson.dataformat.yaml.snakeyaml.reader.StreamReader.update(StreamReader.java:198)
> at
> org.elasticsearch.common.jackson.dataformat.yaml.snakeyaml.reader.StreamReader.<init>(StreamReader.java:62)
>  at
> org.elasticsearch.common.jackson.dataformat.yaml.YAMLParser.<init>(YAMLParser.java:147)
> at
> org.elasticsearch.common.jackson.dataformat.yaml.YAMLFactory._createParser(YAMLFactory.java:530)
>  at
> org.elasticsearch.common.jackson.dataformat.yaml.YAMLFactory.createJsonParser(YAMLFactory.java:420)
> at
> org.elasticsearch.common.xcontent.yaml.YamlXContent.createParser(YamlXContent.java:83)
>  at
> org.apache.flume.sink.elasticsearch.ContentBuilderUtil.addComplexField(ContentBuilderUtil.java:61)
> at
> org.apache.flume.sink.elasticsearch.ContentBuilderUtil.appendField(ContentBuilderUtil.java:47)
>  at
> org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer.appendBody(ElasticSearchLogStashEventSerializer.java:87)
> at
> org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer.getContentBuilder(ElasticSearchLogStashEventSerializer.java:79)
>  at
> org.apache.flume.sink.elasticsearch.ElasticSearchSink.process(ElasticSearchSink.java:178)
> at
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>  at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.CharConversionException: Invalid UTF-8 start byte 0xfc
> (at char #81, byte #-1)
> at
> org.elasticsearch.common.jackson.dataformat.yaml.UTF8Reader.reportInvalidInitial(UTF8Reader.java:395)
>  at
> org.elasticsearch.common.jackson.dataformat.yaml.UTF8Reader.read(UTF8Reader.java:247)