Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Memory Channel


Copy link to this message
-
Re: Memory Channel
Juhani Connolly 2013-01-17, 02:00
The channel is a temporary storage device that decouples the source from
the sink.

Adding and removing data to it are achieved with transactions that
either put or take one or more events. Sources put data in and sinks
take it out.

When a batch is received by the source it will store it to the channel.
If this is a memory channel this means the only guarrantee is that all
the events are now stored in memory on this agent.

When a sink then processes a batch of data, once it commits the
transaction that data will be removed from the channel. If the sink is a
RollingFileSink or other similar physical media sink, at this point you
could consider the data as having been sync'ed.

The timing of the sinks process() calls which handle a batch of
events(what you are referring to as syncing) is governed by the sink
runner which has its own thread.

If your source is generating data faster than your sink can process it,
there can be an increasing delay between being put in the channel and
getting "sync"ed to hdfs/whatever. This can often be resolved by
increasing thread counts or adding more sinks, but may be caused by HDFS
or your disk simply being too slow.

On 01/17/2013 04:03 AM, Mohit Anchlia wrote:
> Just one more question, when I write using memorychannel does that
> write immediately gets written to the sink? It may not get sync on
> HDFS but does it at least immediately gets written. I am trying to see
> if the events are held in flume's memory or not.
>
> On Wed, Jan 16, 2013 at 11:00 AM, Brock Noland <[EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>> wrote:
>
>     The HDFS Sink syncs at the end of each batch or when the file rolls.
>
>     On Wed, Jan 16, 2013 at 10:55 AM, Nitin Pawar
>     <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>     > you can configure it as you nee
>     > number of events
>     > rollover by time
>     > and other ways as well
>     >
>     >
>     > On Thu, Jan 17, 2013 at 12:17 AM, Mohit Anchlia
>     <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
>     > wrote:
>     >>
>     >> Right. I was asking about sync to "sink". My sink is hdfs so
>     does flume
>     >> sync to hdfs on every write operation?
>     >>
>     >>
>     >> On Wed, Jan 16, 2013 at 10:26 AM, Brock Noland
>     <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>     >>>
>     >>> Memory Channel does not write to disk and as such never syncs
>     to disk.
>     >>> File Channel does sync to disk for each batch put on or taken
>     off the
>     >>> channel.
>     >>>
>     >>> On Wed, Jan 16, 2013 at 10:21 AM, Mohit Anchlia
>     <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
>     >>> wrote:
>     >>> > Thanks! What I am really trying to understand is when does
>     flume sync
>     >>> > to the
>     >>> > sink. I am not using batch events.
>     >>> >
>     >>> >
>     >>> > On Wed, Jan 16, 2013 at 9:55 AM, Hari Shreedharan
>     >>> > <[EMAIL PROTECTED]
>     <mailto:[EMAIL PROTECTED]>> wrote:
>     >>> >>
>     >>> >> It means that the channel can store that many events. If it
>     is full,
>     >>> >> then
>     >>> >> the put() calls (on the source side) will start throwing
>     >>> >> ChannelException.
>     >>> >> The put call will block only for keep-alive number of
>     seconds, after
>     >>> >> which
>     >>> >> it will throw.
>     >>> >>
>     >>> >>
>     >>> >> Hari
>     >>> >>
>     >>> >> --
>     >>> >> Hari Shreedharan
>     >>> >>
>     >>> >> On Wednesday, January 16, 2013 at 9:46 AM, Mohit Anchlia wrote:
>     >>> >>
>     >>> >> Could someone help me understand capacity attribute of
>     memoryChannel?
>     >>> >> Does
>     >>> >> it mean that memoryChannel flushes to sink only when this
>     capacity is
>     >>> >> reached or does it mean that it's the max events stored in
>     memory and
>     >>> >> call
>     >>> >> blocks until everything else gets freed?
>     >>> >>
>     >>> >>
>     >>> >> http://flume.apache.org/FlumeUserGuide.html#memory-channel