Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Why does a Flume source need to recognize the format of the message?


+
Praveen Sripati 2013-10-22, 17:59
+
Jarek Jarcec Cecho 2013-10-22, 18:07
+
Roshan Naik 2013-10-22, 20:51
+
Roshan Naik 2013-10-22, 21:21
+
dwight.marzolf@... 2013-10-22, 21:33
+
Roshan Naik 2013-10-22, 22:30
+
dwight.marzolf@... 2013-10-23, 14:48
Copy link to this message
-
Re: Why does a Flume source need to recognize the format of the message?
why dont you share the config you have so far. perhaps somebody here can
comment on it.
On Wed, Oct 23, 2013 at 7:48 AM, <[EMAIL PROTECTED]> wrote:

>  Ok, the place where I am stuck is trying to understand what the flume
> config file looks like to do this.  What does the config for the scribe
> source look like.  I have used the config lines for a scribe source that I
> found in the flume docs.  But I’m not seeing the scribe source split up any
> data.  If the sink is the one that splits up the data then what do the
> config entries look like for an Avro sink that would split up the scribe
> categories.  This is back to my original question of how this is done in
> the config file.****
>
> ** **
>
> *From:* ext Roshan Naik [mailto:[EMAIL PROTECTED]]
> *Sent:* Tuesday, October 22, 2013 6:31 PM
>
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: Why does a Flume source need to recognize the format of
> the message?****
>
> ** **
>
> The source splits the data into individual events and inserts them into
> the channel. In a few cases the sources do additional parsing of data.****
>
> ** **
>
> On Tue, Oct 22, 2013 at 2:33 PM, <[EMAIL PROTECTED]> wrote:****
>
> So, from what I am gathering from the discussion below is that the Scribe
> source doesn’t do the parsing or splitting of data.  It just takes in the
> data flow as is and passes it onto the sink.  The right sink splits the
> Scribe data up based on the category.    That is a good clarification for
> me as I saw it the other way around.   ****
>
>  ****
>
> Having never worked with Thrift or Avro could you give me a sample entry
> for a flume config file for one of these that would parse data with a
> scribe category that is coming in via the Scribe source?****
>
>  ****
>
> Regards,****
>
> dwight****
>
>  ****
>
> *From:* ext Roshan Naik [mailto:[EMAIL PROTECTED]]
> *Sent:* Tuesday, October 22, 2013 5:21 PM
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: Why does a Flume source need to recognize the format of
> the message?****
>
>  ****
>
> i forgot to note that syslog source also does some parsing. ****
>
>  ****
>
> On Tue, Oct 22, 2013 at 1:51 PM, Roshan Naik <[EMAIL PROTECTED]>
> wrote:****
>
> At a minimum it needs to know how to split incoming data into individual
> events. Typically a newline is used as the separator.  ****
>
>  ****
>
>  Avro & thrift are special purpose sources/sinks which handle headers and
> body. Avro, Thrift & HTTP sources will parse the incoming data into header
> + body. AFAIKT most other sources treat the whole thing as a body. They
> should not need any more info other than line/event delimiter. ****
>
>   ****
>
> You can write custom deserializer which is supported by some sources to
> parse custom incoming data format.****
>
>  ****
>
> -roshan****
>
>  ****
>
>  ****
>
> On Tue, Oct 22, 2013 at 11:07 AM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>
> wrote:****
>
> Hi Praveen,
> I think that there is a confusion between message and payload. Whereas
> Flume do not need to understand the payload structure, it do need to
> understand the message to understand what events (what payloads) are there
> with what headers. To put it differently, Flume do not need to understand
> structure of the data that you are sending (payload is just a byte array
> for Flume), but that unknown structure needs to be transferred via known
> protocol (such as AVRO RPC).
>
> Jarcec****
>
>
> On Tue, Oct 22, 2013 at 06:59:17PM +0100, Praveen Sripati wrote:
> > According to the Flume documentation
> >
> > >>    A Flume source consumes events delivered to it by an external
> source
> > like a web server. The external source sends events to Flume in a format
> > that is recognized by the target Flume source. For example, an Avro Flume
> > source can be used to receive Avro events from Avro clients or other
> Flume
> > agents in the flow that send events from an Avro sink.
> >
> > Why does a Flume source need to recognize or understand the format of the

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
+
dwight.marzolf@... 2013-10-23, 20:43
+
Paul Chavez 2013-10-23, 21:28
+
dwight.marzolf@... 2013-10-25, 13:23
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB