Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Analysis of Data

Surindhar 2013-02-07, 09:52
Nitin Pawar 2013-02-07, 10:15
Surindhar 2013-02-07, 10:24
Bertrand Dechoux 2013-02-07, 10:30
Inder Pall 2013-02-07, 10:39
Mike Percy 2013-02-07, 10:59
Nitin Pawar 2013-02-07, 11:22
Steven Yates 2013-02-07, 23:04
Mike Percy 2013-02-08, 03:00
Mike Percy 2013-02-08, 02:46
Steve Yates 2013-02-08, 03:22
Nitin Pawar 2013-02-08, 04:55
Inder Pall 2013-02-08, 08:48
Copy link to this message
Re: Analysis of Data
Good to hear more of your thoughts. Please see inline.

On Thu, Feb 7, 2013 at 8:55 PM, Nitin Pawar <[EMAIL PROTECTED]> wrote:

I can understand  the idea of having data processed inside flume by
> streaming it to another flume agent. But do we really need to re-engineer
> something inside flume is what I am thinking? Core flume dev team may have
> better ideas on this but currently for streaming data processing storm is a
> huge candidate.
> flume does have have an open jira on this integration FLUME-1286<https://issues.apache.org/jira/browse/FLUME-1286>

Yes, a Storm sink could be useful. But that wouldn't preclude us from
taking a hard look at what may be missing in Flume itself, right?

It will be interesting to draw up the comparisons in performance if the
> data processing logic is added to to flume. We do see currently people
> having a little bit of pre-processing of their data (they have their own
> custom channel types where they modify the data and sink it)

It sounds like you have some experience with Flume. Are you guys using it
at Rightster?

I work with a lot of folks to set up and deploy Flume, many of which do
lookups / joins with other systems, transformations, etc. in real time
along their data ingest pipeline before writing the data to HDFS or HBase
for further processing and archival. I wouldn't say these are really heavy
number crunching implementations in Flume, but certainly i see a lot of
inline parsing, inspection, enrichment, routing, and the like going on. I
think Flume could do a lot more, given the right abstractions.

Nitin Pawar 2013-02-08, 09:45
syates@... 2013-02-08, 11:34
Mike Percy 2013-02-08, 22:09
Steven Yates 2013-02-10, 09:00
Steven Yates 2013-02-08, 10:45