|
Mohit Anchlia
2013-01-16, 17:46
Hari Shreedharan
2013-01-16, 17:55
Mohit Anchlia
2013-01-16, 18:21
Brock Noland
2013-01-16, 18:26
Mohit Anchlia
2013-01-16, 18:47
Nitin Pawar
2013-01-16, 18:55
Brock Noland
2013-01-16, 19:00
Mohit Anchlia
2013-01-16, 19:03
Roshan Naik
2013-01-17, 01:13
Juhani Connolly
2013-01-17, 02:00
|
-
Memory ChannelMohit Anchlia 2013-01-16, 17:46
Could someone help me understand capacity attribute of memoryChannel? Does
it mean that memoryChannel flushes to sink only when this capacity is reached or does it mean that it's the max events stored in memory and call blocks until everything else gets freed? http://flume.apache.org/FlumeUserGuide.html#memory-channel +
Mohit Anchlia 2013-01-16, 17:46
-
Re: Memory ChannelHari Shreedharan 2013-01-16, 17:55
It means that the channel can store that many events. If it is full, then the put() calls (on the source side) will start throwing ChannelException. The put call will block only for keep-alive number of seconds, after which it will throw.
Hari -- Hari Shreedharan On Wednesday, January 16, 2013 at 9:46 AM, Mohit Anchlia wrote: > Could someone help me understand capacity attribute of memoryChannel? Does it mean that memoryChannel flushes to sink only when this capacity is reached or does it mean that it's the max events stored in memory and call blocks until everything else gets freed? > > > http://flume.apache.org/FlumeUserGuide.html#memory-channel > > > +
Hari Shreedharan 2013-01-16, 17:55
-
Re: Memory ChannelMohit Anchlia 2013-01-16, 18:21
Thanks! What I am really trying to understand is when does flume sync to
the sink. I am not using batch events. On Wed, Jan 16, 2013 at 9:55 AM, Hari Shreedharan <[EMAIL PROTECTED] > wrote: > It means that the channel can store that many events. If it is full, then > the put() calls (on the source side) will start throwing ChannelException. > The put call will block only for keep-alive number of seconds, after which > it will throw. > > > Hari > > -- > Hari Shreedharan > > On Wednesday, January 16, 2013 at 9:46 AM, Mohit Anchlia wrote: > > Could someone help me understand capacity attribute of memoryChannel? > Does it mean that memoryChannel flushes to sink only when this capacity is > reached or does it mean that it's the max events stored in memory and call > blocks until everything else gets freed? > > > http://flume.apache.org/FlumeUserGuide.html#memory-channel > > > > +
Mohit Anchlia 2013-01-16, 18:21
-
Re: Memory ChannelBrock Noland 2013-01-16, 18:26
Memory Channel does not write to disk and as such never syncs to disk.
File Channel does sync to disk for each batch put on or taken off the channel. On Wed, Jan 16, 2013 at 10:21 AM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: > Thanks! What I am really trying to understand is when does flume sync to the > sink. I am not using batch events. > > > On Wed, Jan 16, 2013 at 9:55 AM, Hari Shreedharan > <[EMAIL PROTECTED]> wrote: >> >> It means that the channel can store that many events. If it is full, then >> the put() calls (on the source side) will start throwing ChannelException. >> The put call will block only for keep-alive number of seconds, after which >> it will throw. >> >> >> Hari >> >> -- >> Hari Shreedharan >> >> On Wednesday, January 16, 2013 at 9:46 AM, Mohit Anchlia wrote: >> >> Could someone help me understand capacity attribute of memoryChannel? Does >> it mean that memoryChannel flushes to sink only when this capacity is >> reached or does it mean that it's the max events stored in memory and call >> blocks until everything else gets freed? >> >> >> http://flume.apache.org/FlumeUserGuide.html#memory-channel >> >> >> > -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ +
Brock Noland 2013-01-16, 18:26
-
Re: Memory ChannelMohit Anchlia 2013-01-16, 18:47
Right. I was asking about sync to "sink". My sink is hdfs so does flume
sync to hdfs on every write operation? On Wed, Jan 16, 2013 at 10:26 AM, Brock Noland <[EMAIL PROTECTED]> wrote: > Memory Channel does not write to disk and as such never syncs to disk. > File Channel does sync to disk for each batch put on or taken off the > channel. > > On Wed, Jan 16, 2013 at 10:21 AM, Mohit Anchlia <[EMAIL PROTECTED]> > wrote: > > Thanks! What I am really trying to understand is when does flume sync to > the > > sink. I am not using batch events. > > > > > > On Wed, Jan 16, 2013 at 9:55 AM, Hari Shreedharan > > <[EMAIL PROTECTED]> wrote: > >> > >> It means that the channel can store that many events. If it is full, > then > >> the put() calls (on the source side) will start throwing > ChannelException. > >> The put call will block only for keep-alive number of seconds, after > which > >> it will throw. > >> > >> > >> Hari > >> > >> -- > >> Hari Shreedharan > >> > >> On Wednesday, January 16, 2013 at 9:46 AM, Mohit Anchlia wrote: > >> > >> Could someone help me understand capacity attribute of memoryChannel? > Does > >> it mean that memoryChannel flushes to sink only when this capacity is > >> reached or does it mean that it's the max events stored in memory and > call > >> blocks until everything else gets freed? > >> > >> > >> http://flume.apache.org/FlumeUserGuide.html#memory-channel > >> > >> > >> > > > > > > -- > Apache MRUnit - Unit testing MapReduce - > http://incubator.apache.org/mrunit/ > +
Mohit Anchlia 2013-01-16, 18:47
-
Re: Memory ChannelNitin Pawar 2013-01-16, 18:55
you can configure it as you nee
number of events rollover by time and other ways as well On Thu, Jan 17, 2013 at 12:17 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > Right. I was asking about sync to "sink". My sink is hdfs so does flume > sync to hdfs on every write operation? > > > On Wed, Jan 16, 2013 at 10:26 AM, Brock Noland <[EMAIL PROTECTED]> wrote: > >> Memory Channel does not write to disk and as such never syncs to disk. >> File Channel does sync to disk for each batch put on or taken off the >> channel. >> >> On Wed, Jan 16, 2013 at 10:21 AM, Mohit Anchlia <[EMAIL PROTECTED]> >> wrote: >> > Thanks! What I am really trying to understand is when does flume sync >> to the >> > sink. I am not using batch events. >> > >> > >> > On Wed, Jan 16, 2013 at 9:55 AM, Hari Shreedharan >> > <[EMAIL PROTECTED]> wrote: >> >> >> >> It means that the channel can store that many events. If it is full, >> then >> >> the put() calls (on the source side) will start throwing >> ChannelException. >> >> The put call will block only for keep-alive number of seconds, after >> which >> >> it will throw. >> >> >> >> >> >> Hari >> >> >> >> -- >> >> Hari Shreedharan >> >> >> >> On Wednesday, January 16, 2013 at 9:46 AM, Mohit Anchlia wrote: >> >> >> >> Could someone help me understand capacity attribute of memoryChannel? >> Does >> >> it mean that memoryChannel flushes to sink only when this capacity is >> >> reached or does it mean that it's the max events stored in memory and >> call >> >> blocks until everything else gets freed? >> >> >> >> >> >> http://flume.apache.org/FlumeUserGuide.html#memory-channel >> >> >> >> >> >> >> > >> >> >> >> -- >> Apache MRUnit - Unit testing MapReduce - >> http://incubator.apache.org/mrunit/ >> > > -- Nitin Pawar +
Nitin Pawar 2013-01-16, 18:55
-
Re: Memory ChannelBrock Noland 2013-01-16, 19:00
The HDFS Sink syncs at the end of each batch or when the file rolls.
On Wed, Jan 16, 2013 at 10:55 AM, Nitin Pawar <[EMAIL PROTECTED]> wrote: > you can configure it as you nee > number of events > rollover by time > and other ways as well > > > On Thu, Jan 17, 2013 at 12:17 AM, Mohit Anchlia <[EMAIL PROTECTED]> > wrote: >> >> Right. I was asking about sync to "sink". My sink is hdfs so does flume >> sync to hdfs on every write operation? >> >> >> On Wed, Jan 16, 2013 at 10:26 AM, Brock Noland <[EMAIL PROTECTED]> wrote: >>> >>> Memory Channel does not write to disk and as such never syncs to disk. >>> File Channel does sync to disk for each batch put on or taken off the >>> channel. >>> >>> On Wed, Jan 16, 2013 at 10:21 AM, Mohit Anchlia <[EMAIL PROTECTED]> >>> wrote: >>> > Thanks! What I am really trying to understand is when does flume sync >>> > to the >>> > sink. I am not using batch events. >>> > >>> > >>> > On Wed, Jan 16, 2013 at 9:55 AM, Hari Shreedharan >>> > <[EMAIL PROTECTED]> wrote: >>> >> >>> >> It means that the channel can store that many events. If it is full, >>> >> then >>> >> the put() calls (on the source side) will start throwing >>> >> ChannelException. >>> >> The put call will block only for keep-alive number of seconds, after >>> >> which >>> >> it will throw. >>> >> >>> >> >>> >> Hari >>> >> >>> >> -- >>> >> Hari Shreedharan >>> >> >>> >> On Wednesday, January 16, 2013 at 9:46 AM, Mohit Anchlia wrote: >>> >> >>> >> Could someone help me understand capacity attribute of memoryChannel? >>> >> Does >>> >> it mean that memoryChannel flushes to sink only when this capacity is >>> >> reached or does it mean that it's the max events stored in memory and >>> >> call >>> >> blocks until everything else gets freed? >>> >> >>> >> >>> >> http://flume.apache.org/FlumeUserGuide.html#memory-channel >>> >> >>> >> >>> >> >>> > >>> >>> >>> >>> -- >>> Apache MRUnit - Unit testing MapReduce - >>> http://incubator.apache.org/mrunit/ >> >> > > > > -- > Nitin Pawar -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ +
Brock Noland 2013-01-16, 19:00
-
Re: Memory ChannelMohit Anchlia 2013-01-16, 19:03
Just one more question, when I write using memorychannel does that write
immediately gets written to the sink? It may not get sync on HDFS but does it at least immediately gets written. I am trying to see if the events are held in flume's memory or not. On Wed, Jan 16, 2013 at 11:00 AM, Brock Noland <[EMAIL PROTECTED]> wrote: > The HDFS Sink syncs at the end of each batch or when the file rolls. > > On Wed, Jan 16, 2013 at 10:55 AM, Nitin Pawar <[EMAIL PROTECTED]> > wrote: > > you can configure it as you nee > > number of events > > rollover by time > > and other ways as well > > > > > > On Thu, Jan 17, 2013 at 12:17 AM, Mohit Anchlia <[EMAIL PROTECTED]> > > wrote: > >> > >> Right. I was asking about sync to "sink". My sink is hdfs so does flume > >> sync to hdfs on every write operation? > >> > >> > >> On Wed, Jan 16, 2013 at 10:26 AM, Brock Noland <[EMAIL PROTECTED]> > wrote: > >>> > >>> Memory Channel does not write to disk and as such never syncs to disk. > >>> File Channel does sync to disk for each batch put on or taken off the > >>> channel. > >>> > >>> On Wed, Jan 16, 2013 at 10:21 AM, Mohit Anchlia < > [EMAIL PROTECTED]> > >>> wrote: > >>> > Thanks! What I am really trying to understand is when does flume sync > >>> > to the > >>> > sink. I am not using batch events. > >>> > > >>> > > >>> > On Wed, Jan 16, 2013 at 9:55 AM, Hari Shreedharan > >>> > <[EMAIL PROTECTED]> wrote: > >>> >> > >>> >> It means that the channel can store that many events. If it is full, > >>> >> then > >>> >> the put() calls (on the source side) will start throwing > >>> >> ChannelException. > >>> >> The put call will block only for keep-alive number of seconds, after > >>> >> which > >>> >> it will throw. > >>> >> > >>> >> > >>> >> Hari > >>> >> > >>> >> -- > >>> >> Hari Shreedharan > >>> >> > >>> >> On Wednesday, January 16, 2013 at 9:46 AM, Mohit Anchlia wrote: > >>> >> > >>> >> Could someone help me understand capacity attribute of > memoryChannel? > >>> >> Does > >>> >> it mean that memoryChannel flushes to sink only when this capacity > is > >>> >> reached or does it mean that it's the max events stored in memory > and > >>> >> call > >>> >> blocks until everything else gets freed? > >>> >> > >>> >> > >>> >> http://flume.apache.org/FlumeUserGuide.html#memory-channel > >>> >> > >>> >> > >>> >> > >>> > > >>> > >>> > >>> > >>> -- > >>> Apache MRUnit - Unit testing MapReduce - > >>> http://incubator.apache.org/mrunit/ > >> > >> > > > > > > > > -- > > Nitin Pawar > > > > -- > Apache MRUnit - Unit testing MapReduce - > http://incubator.apache.org/mrunit/ > +
Mohit Anchlia 2013-01-16, 19:03
-
Re: Memory ChannelRoshan Naik 2013-01-17, 01:13
The source and sink operate on independent threads. The source pumps data
into the (memory) channel which is basically an in memory queue.. and the sink would drain the queue asynchronously. so depending on the speed of the sink, the data can remain in the channel for long/short duration. -roshan On Wed, Jan 16, 2013 at 11:03 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > Just one more question, when I write using memorychannel does that write > immediately gets written to the sink? It may not get sync on HDFS but does > it at least immediately gets written. I am trying to see if the events are > held in flume's memory or not. > > > On Wed, Jan 16, 2013 at 11:00 AM, Brock Noland <[EMAIL PROTECTED]> wrote: > >> The HDFS Sink syncs at the end of each batch or when the file rolls. >> >> On Wed, Jan 16, 2013 at 10:55 AM, Nitin Pawar <[EMAIL PROTECTED]> >> wrote: >> > you can configure it as you nee >> > number of events >> > rollover by time >> > and other ways as well >> > >> > >> > On Thu, Jan 17, 2013 at 12:17 AM, Mohit Anchlia <[EMAIL PROTECTED] >> > >> > wrote: >> >> >> >> Right. I was asking about sync to "sink". My sink is hdfs so does flume >> >> sync to hdfs on every write operation? >> >> >> >> >> >> On Wed, Jan 16, 2013 at 10:26 AM, Brock Noland <[EMAIL PROTECTED]> >> wrote: >> >>> >> >>> Memory Channel does not write to disk and as such never syncs to disk. >> >>> File Channel does sync to disk for each batch put on or taken off the >> >>> channel. >> >>> >> >>> On Wed, Jan 16, 2013 at 10:21 AM, Mohit Anchlia < >> [EMAIL PROTECTED]> >> >>> wrote: >> >>> > Thanks! What I am really trying to understand is when does flume >> sync >> >>> > to the >> >>> > sink. I am not using batch events. >> >>> > >> >>> > >> >>> > On Wed, Jan 16, 2013 at 9:55 AM, Hari Shreedharan >> >>> > <[EMAIL PROTECTED]> wrote: >> >>> >> >> >>> >> It means that the channel can store that many events. If it is >> full, >> >>> >> then >> >>> >> the put() calls (on the source side) will start throwing >> >>> >> ChannelException. >> >>> >> The put call will block only for keep-alive number of seconds, >> after >> >>> >> which >> >>> >> it will throw. >> >>> >> >> >>> >> >> >>> >> Hari >> >>> >> >> >>> >> -- >> >>> >> Hari Shreedharan >> >>> >> >> >>> >> On Wednesday, January 16, 2013 at 9:46 AM, Mohit Anchlia wrote: >> >>> >> >> >>> >> Could someone help me understand capacity attribute of >> memoryChannel? >> >>> >> Does >> >>> >> it mean that memoryChannel flushes to sink only when this capacity >> is >> >>> >> reached or does it mean that it's the max events stored in memory >> and >> >>> >> call >> >>> >> blocks until everything else gets freed? >> >>> >> >> >>> >> >> >>> >> http://flume.apache.org/FlumeUserGuide.html#memory-channel >> >>> >> >> >>> >> >> >>> >> >> >>> > >> >>> >> >>> >> >>> >> >>> -- >> >>> Apache MRUnit - Unit testing MapReduce - >> >>> http://incubator.apache.org/mrunit/ >> >> >> >> >> > >> > >> > >> > -- >> > Nitin Pawar >> >> >> >> -- >> Apache MRUnit - Unit testing MapReduce - >> http://incubator.apache.org/mrunit/ >> > > +
Roshan Naik 2013-01-17, 01:13
-
Re: Memory ChannelJuhani Connolly 2013-01-17, 02:00
The channel is a temporary storage device that decouples the source from
the sink. Adding and removing data to it are achieved with transactions that either put or take one or more events. Sources put data in and sinks take it out. When a batch is received by the source it will store it to the channel. If this is a memory channel this means the only guarrantee is that all the events are now stored in memory on this agent. When a sink then processes a batch of data, once it commits the transaction that data will be removed from the channel. If the sink is a RollingFileSink or other similar physical media sink, at this point you could consider the data as having been sync'ed. The timing of the sinks process() calls which handle a batch of events(what you are referring to as syncing) is governed by the sink runner which has its own thread. If your source is generating data faster than your sink can process it, there can be an increasing delay between being put in the channel and getting "sync"ed to hdfs/whatever. This can often be resolved by increasing thread counts or adding more sinks, but may be caused by HDFS or your disk simply being too slow. On 01/17/2013 04:03 AM, Mohit Anchlia wrote: > Just one more question, when I write using memorychannel does that > write immediately gets written to the sink? It may not get sync on > HDFS but does it at least immediately gets written. I am trying to see > if the events are held in flume's memory or not. > > On Wed, Jan 16, 2013 at 11:00 AM, Brock Noland <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > The HDFS Sink syncs at the end of each batch or when the file rolls. > > On Wed, Jan 16, 2013 at 10:55 AM, Nitin Pawar > <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote: > > you can configure it as you nee > > number of events > > rollover by time > > and other ways as well > > > > > > On Thu, Jan 17, 2013 at 12:17 AM, Mohit Anchlia > <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> > > wrote: > >> > >> Right. I was asking about sync to "sink". My sink is hdfs so > does flume > >> sync to hdfs on every write operation? > >> > >> > >> On Wed, Jan 16, 2013 at 10:26 AM, Brock Noland > <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote: > >>> > >>> Memory Channel does not write to disk and as such never syncs > to disk. > >>> File Channel does sync to disk for each batch put on or taken > off the > >>> channel. > >>> > >>> On Wed, Jan 16, 2013 at 10:21 AM, Mohit Anchlia > <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> > >>> wrote: > >>> > Thanks! What I am really trying to understand is when does > flume sync > >>> > to the > >>> > sink. I am not using batch events. > >>> > > >>> > > >>> > On Wed, Jan 16, 2013 at 9:55 AM, Hari Shreedharan > >>> > <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > >>> >> > >>> >> It means that the channel can store that many events. If it > is full, > >>> >> then > >>> >> the put() calls (on the source side) will start throwing > >>> >> ChannelException. > >>> >> The put call will block only for keep-alive number of > seconds, after > >>> >> which > >>> >> it will throw. > >>> >> > >>> >> > >>> >> Hari > >>> >> > >>> >> -- > >>> >> Hari Shreedharan > >>> >> > >>> >> On Wednesday, January 16, 2013 at 9:46 AM, Mohit Anchlia wrote: > >>> >> > >>> >> Could someone help me understand capacity attribute of > memoryChannel? > >>> >> Does > >>> >> it mean that memoryChannel flushes to sink only when this > capacity is > >>> >> reached or does it mean that it's the max events stored in > memory and > >>> >> call > >>> >> blocks until everything else gets freed? > >>> >> > >>> >> > >>> >> http://flume.apache.org/FlumeUserGuide.html#memory-channel +
Juhani Connolly 2013-01-17, 02:00
|