Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Analysis of Data


+
Surindhar 2013-02-07, 09:52
+
Nitin Pawar 2013-02-07, 10:15
+
Surindhar 2013-02-07, 10:24
+
Bertrand Dechoux 2013-02-07, 10:30
+
Inder Pall 2013-02-07, 10:39
+
Mike Percy 2013-02-07, 10:59
+
Nitin Pawar 2013-02-07, 11:22
+
Steven Yates 2013-02-07, 23:04
+
Mike Percy 2013-02-08, 03:00
+
Mike Percy 2013-02-08, 02:46
+
Steve Yates 2013-02-08, 03:22
+
Nitin Pawar 2013-02-08, 04:55
+
Inder Pall 2013-02-08, 08:48
+
Mike Percy 2013-02-08, 08:56
+
Nitin Pawar 2013-02-08, 09:45
+
syates@... 2013-02-08, 11:34
+
Mike Percy 2013-02-08, 22:09
Copy link to this message
-
Re: Analysis of Data
Absolutely Mike thank you.

Specifically though it would be nice to be able to feedback the results from
an external process (such as Mahout or Storm) into a Flume channel/sink?

-Steve

From:  Mike Percy <[EMAIL PROTECTED]>
Reply-To:  <[EMAIL PROTECTED]>
Date:  Fri, 8 Feb 2013 14:09:04 -0800
To:  "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Cc:  Nitin Pawar <[EMAIL PROTECTED]>
Subject:  Re: Analysis of Data

Steven,
Any reason you are not using interceptors for that? Can you provide more
detail on what you are doing?

See more about Interceptors here:
http://flume.apache.org/FlumeUserGuide.html#flume-interceptors

Regards
Mike
On Fri, Feb 8, 2013 at 3:34 AM,  <[EMAIL PROTECTED]> wrote:
> Hi Nitin,
>
> Would it be feasible to consider the addition of another extension point with
> Flume for the purposes of custom filtering, enrichment, routing etc. Without
> trying to envision Flume away into something it was never designed for (i.e
> without going overboard) The concept of some sort of intermediate processing
> unit is quite attractive to me personally as I have my dedicated AvroSources
> purely for aggregating data however in the interest of modularisation I may
> want to perform some enrichment/filtering exercise before I dump the events on
> my durable channel. I guess the conversation of flow and some sort of
> declarative way of configuring the ordering of the processing units etc. Just
> thinking out loud.
>
>
> @Nitin/Mike , your experience in the field will assist in validating this
> further
>
> -Steve
>
> Quoting Nitin Pawar <[EMAIL PROTECTED]>:
>
>> Mike, Yes
>>
>> I am not against the approach flume doing it. I would love to see it part
>> of flume (it ofcourse helps to remove overload of one processing engine).
>> As flume already supports the grouping of agents to the normal route of
>> acquisition  and sink can continue.
>>
>> In another route, we can have it to sink to a processor source of flume
>> which then converts the data and runs quick analysis on data in memory and
>> update the global counters kind of things which then can be sink to live
>> reporting systems.
>>
>> Thanks,
>> Nitin
>>
>>
>> On Fri, Feb 8, 2013 at 2:26 PM, Mike Percy <[EMAIL PROTECTED]> wrote:
>>
>>> Nitin,
>>> Good to hear more of your thoughts. Please see inline.
>>>
>>> On Thu, Feb 7, 2013 at 8:55 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>>>
>>> I can understand  the idea of having data processed inside flume by
>>>> streaming it to another flume agent. But do we really need to re-engineer
>>>> something inside flume is what I am thinking? Core flume dev team may have
>>>> better ideas on this but currently for streaming data processing storm is a
>>>> huge candidate.
>>>> flume does have have an open jira on this integration
>>>> FLUME-1286<https://issues.apache.org/jira/browse/FLUME-1286
>>>> <https://issues.apache.org/jira/browse/FLUME-1286> >
>>>>
>>>
>>> Yes, a Storm sink could be useful. But that wouldn't preclude us from
>>> taking a hard look at what may be missing in Flume itself, right?
>>>
>>> It will be interesting to draw up the comparisons in performance if the
>>>> data processing logic is added to to flume. We do see currently people
>>>> having a little bit of pre-processing of their data (they have their own
>>>> custom channel types where they modify the data and sink it)
>>>>
>>>
>>> It sounds like you have some experience with Flume. Are you guys using it
>>> at Rightster?
>>>
>>> I work with a lot of folks to set up and deploy Flume, many of which do
>>> lookups / joins with other systems, transformations, etc. in real time
>>> along their data ingest pipeline before writing the data to HDFS or HBase
>>> for further processing and archival. I wouldn't say these are really heavy
>>> number crunching implementations in Flume, but certainly i see a lot of
>>> inline parsing, inspection, enrichment, routing, and the like going on. I
>>> think Flume could do a lot more, given the right abstractions.
+
Steven Yates 2013-02-08, 10:45
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB