Surindhar 2013-02-07, 09:52
Nitin Pawar 2013-02-07, 10:15
Surindhar 2013-02-07, 10:24
Bertrand Dechoux 2013-02-07, 10:30
Inder Pall 2013-02-07, 10:39
Mike Percy 2013-02-07, 10:59
Nitin Pawar 2013-02-07, 11:22
Steven Yates 2013-02-07, 23:04
Mike Percy 2013-02-08, 03:00
Mike Percy 2013-02-08, 02:46
Steve Yates 2013-02-08, 03:22
Nitin Pawar 2013-02-08, 04:55
Inder Pall 2013-02-08, 08:48
Mike Percy 2013-02-08, 08:56
Nitin Pawar 2013-02-08, 09:45
syates@... 2013-02-08, 11:34
Mike Percy 2013-02-08, 22:09
-Re: Analysis of Data
Steven Yates 2013-02-10, 09:00
Absolutely Mike thank you.
Specifically though it would be nice to be able to feedback the results from
an external process (such as Mahout or Storm) into a Flume channel/sink?
From: Mike Percy <[EMAIL PROTECTED]>
Reply-To: <[EMAIL PROTECTED]>
Date: Fri, 8 Feb 2013 14:09:04 -0800
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Cc: Nitin Pawar <[EMAIL PROTECTED]>
Subject: Re: Analysis of Data
Any reason you are not using interceptors for that? Can you provide more
detail on what you are doing?
See more about Interceptors here:
On Fri, Feb 8, 2013 at 3:34 AM, <[EMAIL PROTECTED]> wrote:
> Hi Nitin,
> Would it be feasible to consider the addition of another extension point with
> Flume for the purposes of custom filtering, enrichment, routing etc. Without
> trying to envision Flume away into something it was never designed for (i.e
> without going overboard) The concept of some sort of intermediate processing
> unit is quite attractive to me personally as I have my dedicated AvroSources
> purely for aggregating data however in the interest of modularisation I may
> want to perform some enrichment/filtering exercise before I dump the events on
> my durable channel. I guess the conversation of flow and some sort of
> declarative way of configuring the ordering of the processing units etc. Just
> thinking out loud.
> @Nitin/Mike , your experience in the field will assist in validating this
> Quoting Nitin Pawar <[EMAIL PROTECTED]>:
>> Mike, Yes
>> I am not against the approach flume doing it. I would love to see it part
>> of flume (it ofcourse helps to remove overload of one processing engine).
>> As flume already supports the grouping of agents to the normal route of
>> acquisition and sink can continue.
>> In another route, we can have it to sink to a processor source of flume
>> which then converts the data and runs quick analysis on data in memory and
>> update the global counters kind of things which then can be sink to live
>> reporting systems.
>> On Fri, Feb 8, 2013 at 2:26 PM, Mike Percy <[EMAIL PROTECTED]> wrote:
>>> Good to hear more of your thoughts. Please see inline.
>>> On Thu, Feb 7, 2013 at 8:55 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>>> I can understand the idea of having data processed inside flume by
>>>> streaming it to another flume agent. But do we really need to re-engineer
>>>> something inside flume is what I am thinking? Core flume dev team may have
>>>> better ideas on this but currently for streaming data processing storm is a
>>>> huge candidate.
>>>> flume does have have an open jira on this integration
>>>> <https://issues.apache.org/jira/browse/FLUME-1286> >
>>> Yes, a Storm sink could be useful. But that wouldn't preclude us from
>>> taking a hard look at what may be missing in Flume itself, right?
>>> It will be interesting to draw up the comparisons in performance if the
>>>> data processing logic is added to to flume. We do see currently people
>>>> having a little bit of pre-processing of their data (they have their own
>>>> custom channel types where they modify the data and sink it)
>>> It sounds like you have some experience with Flume. Are you guys using it
>>> at Rightster?
>>> I work with a lot of folks to set up and deploy Flume, many of which do
>>> lookups / joins with other systems, transformations, etc. in real time
>>> along their data ingest pipeline before writing the data to HDFS or HBase
>>> for further processing and archival. I wouldn't say these are really heavy
>>> number crunching implementations in Flume, but certainly i see a lot of
>>> inline parsing, inspection, enrichment, routing, and the like going on. I
>>> think Flume could do a lot more, given the right abstractions.
Steven Yates 2013-02-08, 10:45