Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> MorphlineInterceptor questions


Copy link to this message
-
Re: MorphlineInterceptor questions

On Nov 11, 2013, at 9:09 PM, Otis Gospodnetic wrote:

> Hi,
>
> While poking around MorphlineSolrSink I got intrigued by
> MorphlineIntercepor in ...solr.morphline package.  A few Qs:
>
> 1) This is also not Solr-specific, right?

yep

>
> 2) I couldn't find any code in ...solr.morphline package that actually
> uses this MorphlineInterceptor... is it not used?

In Flume an Interceptor is a separate concept from a Sink. You can use the Interceptor without the Sink, and vice versa.

>
> 3) I see Morphline command's "process(...)" method being called from
> both MorphlineIntercetor AND from MorphlineHandlerImpl.  How come?  My
> impression is that MorphlineHandlerImpl code is what is actually meant
> to be used, while MorphlineInterceptor doesn't seem to be used....
> what am I missing? :)
>
> 4) I found the following in the Flume Guide: "This interceptor is not
> intended for heavy duty ETL processing - if you need this consider
> moving ETL processing from the Flume Source to a Flume Sink".
> Why should one not use MorphlineInterceptor for heavy duty ETL processing?

Two reasons:

1) Interceptors are running in the thread of the Flume Source, and are thus tightly coupled to the Flume Source and the I/O handler of the Flume Source. It's safer to not block or fail in that thread - better to hand data off of that thread as soon as possible into the Flume Channel (i.e a queue from which sinks take events - sinks run in another thread and are thus more isolated).

2) Flume Interceptors have the limitation that they can only generate zero or one output events for each input event. So generating N events for an input event isn't possible, like one might want to do when emitting one event per input line, or or one event per input column, or one event per email attachment, etc.

To summarize, the reasons aren't specific to morphlines, they are rooted in the way Flume has designed the concept of Interceptors.

Wolfgang.

>
> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB