Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Analysis of Data

Copy link to this message
Re: Analysis of Data
Good to hear more of your thoughts. Please see inline.

On Thu, Feb 7, 2013 at 8:55 PM, Nitin Pawar <[EMAIL PROTECTED]> wrote:

I can understand  the idea of having data processed inside flume by
> streaming it to another flume agent. But do we really need to re-engineer
> something inside flume is what I am thinking? Core flume dev team may have
> better ideas on this but currently for streaming data processing storm is a
> huge candidate.
> flume does have have an open jira on this integration FLUME-1286<https://issues.apache.org/jira/browse/FLUME-1286>

Yes, a Storm sink could be useful. But that wouldn't preclude us from
taking a hard look at what may be missing in Flume itself, right?

It will be interesting to draw up the comparisons in performance if the
> data processing logic is added to to flume. We do see currently people
> having a little bit of pre-processing of their data (they have their own
> custom channel types where they modify the data and sink it)

It sounds like you have some experience with Flume. Are you guys using it
at Rightster?

I work with a lot of folks to set up and deploy Flume, many of which do
lookups / joins with other systems, transformations, etc. in real time
along their data ingest pipeline before writing the data to HDFS or HBase
for further processing and archival. I wouldn't say these are really heavy
number crunching implementations in Flume, but certainly i see a lot of
inline parsing, inspection, enrichment, routing, and the like going on. I
think Flume could do a lot more, given the right abstractions.