Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> YAMLException in the elasticsearch sink

Copy link to this message
Re: YAMLException in the elasticsearch sink

Flogger is available here:

I've forked it to accept multiple -t args, but it basically uses the legacy
thrift/rpc protocol to add events from STDIN. Both the file_roll and HDFS
sinks do not run into UTF-8 errors. The architecture is basically tail |
flogger -> local flume instance -> log processing flume instance -> { hdfs,
file_roll, elasticsearch }. I can send specific configs if necessary, but
it's all pretty standard as per the User Guide.

On Wed, Jun 12, 2013 at 12:14 PM, Edward Sargisson <[EMAIL PROTECTED]> wrote:

> Hi Allan,
> I think I would run it a debugger and look at the buffer that way. You
> should be able to put
> JAVA_OPTS="-agentlib:jdwp=transport=dt_socket,address=localhost:9009,server=y,suspend=n"
> into your /etc/flume-ng/conf/flume-env.conf and then attach a debugger
> with a break point on org.apache.flume.sink.
> elasticsearch.ElasticSearchSink.process(ElasticSearchSink.java:178)
> You could try the file_roll sink but I'm not sure if it won't munge the
> character sets itself.
> Can you send me a link to flogger and your configuration for it? I'm not
> familiar with it.
> Cheers,
> Edward
> "
> Edward,
> Thanks for the reply. I'm not encoding my events in any specific character
> set. I'm using flogger to send application logs (nodejs, ruby, perl etc)
> into my flume infrastructure. It seems that only the ElasticSearchSink
> encounters this issue. I'm not sure if the HDFS or file roll sinks are
> forcing an encoding before trying to process (haven't checked the code
> yet). Is there an easy way to have flume output the hex data of an event?
> I'd love to provide the hex alongside the exception.
> Thanks,
> Allan
> On Tue, Jun 11, 2013 at 12:33 PM, Edward Sargisson <[EMAIL PROTECTED]>wrote:
>> Hi Allan,
>> I would like to see the contents of the event you are trying to store -
>> in hex - paired with the exception that relates to that message.
>> This, "Invalid UTF-8 start byte 0xfc (at char #81, byte #-1)" indicates
>> that that there is a problem with the data and the character sets. In other
>> words, are you encoding your data to be sent to Flume in UTF-8 or something
>> else?
>> Cheers,
>> Edward
>> "
>> I think this might have to deal specifically with the LogStash
>> serializer, but I am unsure. After a period of time, it seems some of my
>> events cause an exception and eventually fill up my memory channel. Below
>> is the stacktrace, any help would be greatly appreciated. I can file a bug
>> report but would like to know what kind of information to provide.
>> 10 Jun 2013 09:52:34,360 ERROR
>> [SinkRunner-PollingRunner-DefaultSinkProcessor]
>> (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver
>> event. Exception follows.
>> org.elasticsearch.common.jackson.dataformat.yaml.snakeyaml.error.YAMLException:
>> java.io.CharConversionException: Invalid UTF-8 start byte 0xfc (at char
>> #81, byte #-1)
>>  at
>> org.elasticsearch.common.jackson.dataformat.yaml.snakeyaml.reader.StreamReader.update(StreamReader.java:198)
>> at
>> org.elasticsearch.common.jackson.dataformat.yaml.snakeyaml.reader.StreamReader.<init>(StreamReader.java:62)
>>  at
>> org.elasticsearch.common.jackson.dataformat.yaml.YAMLParser.<init>(YAMLParser.java:147)
>> at
>> org.elasticsearch.common.jackson.dataformat.yaml.YAMLFactory._createParser(YAMLFactory.java:530)
>>  at
>> org.elasticsearch.common.jackson.dataformat.yaml.YAMLFactory.createJsonParser(YAMLFactory.java:420)
>> at
>> org.elasticsearch.common.xcontent.yaml.YamlXContent.createParser(YamlXContent.java:83)
>>  at
>> org.apache.flume.sink.elasticsearch.ContentBuilderUtil.addComplexField(ContentBuilderUtil.java:61)
>> at
>> org.apache.flume.sink.elasticsearch.ContentBuilderUtil.appendField(ContentBuilderUtil.java:47)
>>  at
>> org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer.appendBody(ElasticSearchLogStashEventSerializer.java:87)