Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # dev - Possible Conflicting Information Regarding Relationship Between Channels and Sinks within Documentation


Copy link to this message
-
Re: Possible Conflicting Information Regarding Relationship Between Channels and Sinks within Documentation
Connor Woodson 2013-03-15, 04:21
That statement in the developer guide appears to be inaccurate; feel free
to submit a JIRA to that effect. I believe the user guide is updated much
more than the developer guide and should thus be considered more correct
(although there are outstanding issues with it I believe).

To just go back over how each agent should work:

A source can have one or more channels. Depending on the chosen channel
selector, the source will send an event to a specific channel (multiplexing
selector) or copy it to all channels (replicating selector, the default
behavior).

Each channel goes from a single Source/Selector to a single Sink Processor
(technically to a Sink Group, and then each group has a defined Processor;
but in the configuration/user guide those two are referenced as a single
thing).

The default sink processor only supports a single sink; but by manually
configuring the sink processor/group you can support multiple sinks. Each
sink in a sink processor group must pull from the same channel, hence each
sink/sink group reads from a single channel (note that the channel property
for a sink is ".channel" whereas for a source it is ".channels" with the
's'), but each channel is able to go to multiple sinks (again, a channel
goes to a sink group/processor, something which is not obvious in the
configuration; if you have two sinks that are not explicitly in the same
sink group, then you'll get an error).

- Connor
On Thu, Mar 14, 2013 at 8:50 PM, Israel Ekpo <[EMAIL PROTECTED]> wrote:

> Hey guys,
>
> I have a quick question that I would like to ask based on what I found
> within the user and developer documentation.
>
> This could cause some confusion for first time folks.
>
> *Background:*
>
> From the documentation, a Flume source accepts event data and sends it into
> a channel.
>
> These event data are queued up in the channel.
>
>  A sink takes data from the channel for processing (forwarding to another
> agent's source or central repo).
>
> Furthermore, *there can be one source, one or more channels, and one or
> more sinks for each agent.  *
>
> Within an agent, *a flume source can write to multiple channels, but a sink
> can pull events from only one channel.*
>
> Hence, within this context, the relationship between a source and channel
> could be one to many but the relationship between a sink and channel is
> always one-to-one.
>
> *
> Potential Conflicting Information in Documentation*:
>
> On this page,
> http://flume.apache.org/FlumeUserGuide.html#defining-the-flow
>
> It states that *"A source instance can specify multiple channels, but a
> sink instance can only specify one channel."*
>
>
> However, on this page,
> http://flume.apache.org/FlumeDeveloperGuide.html#sink
>
> I noticed the following sentence:
>
> *A Sink is associated with one or more Channels, as configured in the Flume
> properties file.*
>
>
> *Question and Next Steps*:
>
> Within what context is this an accurate statement for a sink instance?
>
> From the context of a single agent, is this an accurate statement? If not
> can I create a JIRA issue and submit a patch to correct it?
>