Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - YAMLException in the elasticsearch sink


Copy link to this message
-
Re: YAMLException in the elasticsearch sink
Allan Feid 2013-06-13, 19:33
After even further investigation, it seems the ContentBuilderUtil
calls org.elasticsearch.common.xcontent.XContentFactory, specifically the
xContentType method seen here:

https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/common/xcontent/XContentFactory.java#L116

If that function returns null, then it just tries to force the data to be a
string, otherwise it does some magic parsing based on the contentType. I
believe that this is where the problem happens. The xContentType function
thinks my string is YAML, then the YAML parser fails and isn't caught. Does
it make sense to have the try/catch in addComplexField catch the YAML
exception and fall back to using a addSimpleField? It seems the code
already does this for JSON related exceptions.

Thanks,
Allan
On Thu, Jun 13, 2013 at 1:11 PM, Allan Feid <[EMAIL PROTECTED]> wrote:

> Edward,
>
> I think I've ran into a very similar issue while writing a custom
> interceptor. I simply catch the exception and log it when it happens, and
> this is what the log body looks like:
>
>
> foo¤data¤1371126476.436¤0.005¤555¤10.1.1.1¤HTTP/1.1¤GET¤http¤vhost¤/path/url¤¤-¤200¤
> referrer.com/search/?query=\x8D\x91\x89\xEF\x8Bc\x8E\x96\x93\xB0<http://referrer.com/search/?query=%5Cx8D%5Cx91%5Cx89%5CxEF%5Cx8Bc%5Cx8E%5Cx96%5Cx93%5CxB0>
> ¤-¤-¤-
>
> I believe the query string here is the culprit (I know the ¤ character
> works fine in utf8). I think ideally there would be a way to take the
> \x8D\x91.. data and leave it as a literal string, but I currently don't
> know how to do that.
>
> Thanks,
> Allan
>
>
> On Thu, Jun 13, 2013 at 12:13 PM, Edward Sargisson <[EMAIL PROTECTED]>wrote:
>
>> Hi Allan,
>> So it appears that flogger is simply grabbing standard input and put it
>> into the body - which is fine.
>> Can you track the error down to a specific line in your input file? I
>> would be interested to know how it is encoded.
>>
>> Cheers,
>> Edward
>>
>>
>> "
>> Edward,
>>
>> Flogger is available here:
>> https://github.com/cloudera/flume/tree/master/contrib/flogger
>>
>> I've forked it to accept multiple -t args, but it basically uses the
>> legacy thrift/rpc protocol to add events from STDIN. Both the file_roll and
>> HDFS sinks do not run into UTF-8 errors. The architecture is basically tail
>> | flogger -> local flume instance -> log processing flume instance -> {
>> hdfs, file_roll, elasticsearch }. I can send specific configs if necessary,
>> but it's all pretty standard as per the User Guide.
>>
>> Thanks,
>> Allan
>>
>>
>> On Wed, Jun 12, 2013 at 12:14 PM, Edward Sargisson <[EMAIL PROTECTED]>wrote:
>> Hi Allan,
>> I think I would run it a debugger and look at the buffer that way. You
>> should be able to put
>>
>>
>> JAVA_OPTS="-agentlib:jdwp=transport=dt_socket,address=localhost:9009,server=y,suspend=n"
>>
>> into your /etc/flume-ng/conf/flume-env.conf and then attach a debugger
>> with a break point on org.apache.flume.sink.
>> elasticsearch.ElasticSearchSink.process(ElasticSearchSink.java:178)
>>
>> You could try the file_roll sink but I'm not sure if it won't munge the
>> character sets itself.
>>
>> Can you send me a link to flogger and your configuration for it? I'm not
>> familiar with it.
>>
>> Cheers,
>> Edward
>>
>> "
>> Edward,
>>
>> Thanks for the reply. I'm not encoding my events in any specific
>> character set. I'm using flogger to send application logs (nodejs, ruby,
>> perl etc) into my flume infrastructure. It seems that only the
>> ElasticSearchSink encounters this issue. I'm not sure if the HDFS or file
>> roll sinks are forcing an encoding before trying to process (haven't
>> checked the code yet). Is there an easy way to have flume output the hex
>> data of an event? I'd love to provide the hex alongside the exception.
>>
>> Thanks,
>> Allan
>>
>>
>> On Tue, Jun 11, 2013 at 12:33 PM, Edward Sargisson <[EMAIL PROTECTED]>wrote:
>>
>>> Hi Allan,
>>> I would like to see the contents of the event you are trying to store -
>>> in hex - paired with the exception that relates to that message.