Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - ElasticSearch logstash serializer fields stored as strings


+
Allan Feid 2013-06-13, 17:06
Copy link to this message
-
Re: ElasticSearch logstash serializer fields stored as strings
Edward Sargisson 2013-06-14, 16:10
Hi Allan,
To answer your question, "Will either of these snippets work when the value
of a header is a float?" I don't know. I've never tried. My guess is that
it probably will work - on the basis that the serializer is just shuttling
bytes without checking representations.

You could try writing a custom serializer of your own where you explicitly
identify the header you want to write as a float and write it as a float.
Do note that the interface is changing slightly in 1.4.0.

Cheers,
Edward

"Hello all,

I've been using the ElasticSearchLogStashEventSerializer with pretty great
success. It's great that it defaults to strings for all the headers, but in
some cases I would rather my data be stored as a float in ElasticSearch.
I've been digging through the code for the serializer, but I'm not so great
with java. I noticed that in the dynamic serializer the following comment
is there:

 * A best effort will be used to determine the content-type, if it cannot be
 * determined fields will be indexed as Strings

So I looked at the code for appending headers and saw this snippet:

    Map<String, String> headers = event.getHeaders();
    for (String key : headers.keySet()) {
      ContentBuilderUtil.appendField(builder, key,
          headers.get(key).getBytes(charset));
    }

In the logstash serializer the fields portion gets set by:

    builder.startObject("@fields");
    for (String key : headers.keySet()) {
      byte[] val = headers.get(key).getBytes(charset);
      ContentBuilderUtil.appendField(builder, key, val);
    }
    builder.endObject();

Will either of these snippets work when the value of a header is a float?
If so I'd like to give it a try. The reason I'm doing all of this is to
take advantage of some Kibana features which require number based fields
(like graphing response times over time).

Thanks,
Allan"