Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # dev >> Possible Conflicting Information Regarding Relationship Between Channels and Sinks within Documentation


+
Israel Ekpo 2013-03-15, 03:50
Copy link to this message
-
Re: Possible Conflicting Information Regarding Relationship Between Channels and Sinks within Documentation
That statement in the developer guide appears to be inaccurate; feel free
to submit a JIRA to that effect. I believe the user guide is updated much
more than the developer guide and should thus be considered more correct
(although there are outstanding issues with it I believe).

To just go back over how each agent should work:

A source can have one or more channels. Depending on the chosen channel
selector, the source will send an event to a specific channel (multiplexing
selector) or copy it to all channels (replicating selector, the default
behavior).

Each channel goes from a single Source/Selector to a single Sink Processor
(technically to a Sink Group, and then each group has a defined Processor;
but in the configuration/user guide those two are referenced as a single
thing).

The default sink processor only supports a single sink; but by manually
configuring the sink processor/group you can support multiple sinks. Each
sink in a sink processor group must pull from the same channel, hence each
sink/sink group reads from a single channel (note that the channel property
for a sink is ".channel" whereas for a source it is ".channels" with the
's'), but each channel is able to go to multiple sinks (again, a channel
goes to a sink group/processor, something which is not obvious in the
configuration; if you have two sinks that are not explicitly in the same
sink group, then you'll get an error).

- Connor
On Thu, Mar 14, 2013 at 8:50 PM, Israel Ekpo <[EMAIL PROTECTED]> wrote:

> Hey guys,
>
> I have a quick question that I would like to ask based on what I found
> within the user and developer documentation.
>
> This could cause some confusion for first time folks.
>
> *Background:*
>
> From the documentation, a Flume source accepts event data and sends it into
> a channel.
>
> These event data are queued up in the channel.
>
>  A sink takes data from the channel for processing (forwarding to another
> agent's source or central repo).
>
> Furthermore, *there can be one source, one or more channels, and one or
> more sinks for each agent.  *
>
> Within an agent, *a flume source can write to multiple channels, but a sink
> can pull events from only one channel.*
>
> Hence, within this context, the relationship between a source and channel
> could be one to many but the relationship between a sink and channel is
> always one-to-one.
>
> *
> Potential Conflicting Information in Documentation*:
>
> On this page,
> http://flume.apache.org/FlumeUserGuide.html#defining-the-flow
>
> It states that *"A source instance can specify multiple channels, but a
> sink instance can only specify one channel."*
>
>
> However, on this page,
> http://flume.apache.org/FlumeDeveloperGuide.html#sink
>
> I noticed the following sentence:
>
> *A Sink is associated with one or more Channels, as configured in the Flume
> properties file.*
>
>
> *Question and Next Steps*:
>
> Within what context is this an accurate statement for a sink instance?
>
> From the context of a single agent, is this an accurate statement? If not
> can I create a JIRA issue and submit a patch to correct it?
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB