Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # dev - Questions about Morphline Solr Sink structure

Copy link to this message
Re: Questions about Morphline Solr Sink structure
Otis Gospodnetic 2013-11-11, 00:42

One more "proactive" question.

Isn't all code under the .... solr/morphline package not really about
Morphline *Solr* Sink, but really more about *Morphline* Sink?
In other words, if where Morphline actually outputs is dictated by the
Morphline command in Morphline config (e.g. loadSolr()), then as far
as Flume is concerned, isn't that really just *Morphline* Sink?

For example, if I wanted to get Flume to pass events through Morphline
and have Morphline output to Elasticsearch, I wouldn't really want to
add a while new Elasticsearch Morphline Sink.  I should really just be
able to use the existing (misnamed?) Morphline Solr Sink and just
point it to a Morphline config that has laodElasticsearch() instead of

(please ignore the fact Morphline doesn't actually have
loadElasticsearch() yet - I think this is a Morphline issue, not a
Flume issue)

Is the above correct?

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/
On Sun, Nov 10, 2013 at 7:29 PM, Otis Gospodnetic
> Hello,
> Warning: I've got a Flume NG and Morphlines newbie status
> I was looking at Morphline Solr Sink to see how one could write an
> equivalent Morphline Elasticsearch Sink, but after looking at the
> code, I'm a bit confused.  Here are my Qs:
> 1)  interface MorphlineHandler mentions Solr in N places, but it
> doesn't seem to be Solr-specific.  Couldn't one reuse this interface
> for a Morphline ES Sink?
> 2) In general, couldn't/shouldn't a few classes from
> org.apache.flume.sink.solr.morphline package really not outside
> anything solr-specific? e.g.  org.apache.flume.sink.morphline for
> those that are Morphline-specific?
> 3) Similarly, BlobDeserializer and BlobHandler don't seem to be even
> Morphline-specific.  Shouldn't they be elsewhere?
> 4) I was expecting to see SolrJ (Solr Java client library) being used
> in MorphlineHandlerImpl or MorphlineSolrSink to send events to Solr,
> but there is no trace of SolrJ there.  How exactly does this load
> Flume events into Solr then?
> Ooooh, is that because when using this sink one is supposed to provide
> a Morphline config and this config has a hard-coded loadSolr()
> command?
> 5) Would it make sense to refactor any of the current Morphline Solr
> Sink code to make it easier to add things Morphline Elasticsearch
> Sink?  If so, any guidance you could provide would be very helpful.
> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/