Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - performance


+
Nathaniel Auvil 2012-11-07, 18:48
+
Hari Shreedharan 2012-11-07, 19:03
+
Nathaniel Auvil 2012-11-07, 19:08
Copy link to this message
-
Re: performance
Hari Shreedharan 2012-11-07, 19:18
The channel is a passive component. It has no notion of blocking. All the Flume channels support multiple transactions happening simultaneously - and none of these transactions block. If a channel has no events to return, the take() method will simply return null. Multiple sinks can pull events out of the same channel and none of them would block. When there are no events available, the take() method returns null, and if the sink did not get any events at all in that transaction, then the sink's process method should return BACKOFF, so that the sink runner will wait for a few seconds before calling the process method again.
Hari

--
Hari Shreedharan
On Wednesday, November 7, 2012 at 11:08 AM, Nathaniel Auvil wrote:

> it is my understanding, perhaps incorrectly, that when you start a transaction in a sink, the channel blocks until that transaction is committed.  Are you saying you can have multiple sinks pulling simultaneously from a single channel and the transactional semantics will not cause blocking?
>
>
> On Wed, Nov 7, 2012 at 2:03 PM, Hari Shreedharan <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
> > Hi Nathaniel,
> >
> > What do you mean single-threaded model? Almost all of Flume's components are multithreaded - if you mean sink being driven by one thread - you can always add more sinks - and each one will be driven by its own thread. If you want to write the same data to multiple locations - just add more channels to the same source (thus replicating the data) and attach the sinks as required - this will allow you to get data to multiple locations. If you want to write to higher latency location, you an either add multiple sinks reading from the same channel (thus creating multiple sink runners), or make your sink multithreaded (spawn multiple threads inside the process method and then wait for all threads to succeed/fail), so more threads do I/O.  
> >
> >
> > Hari
> > --
> > Hari Shreedharan
> >
> >
> > On Wednesday, November 7, 2012 at 10:48 AM, Nathaniel Auvil wrote:
> >
> > > in addition to HDFS, i need to support sending events to a higher latency (network related) target which in our current implementation mitigates by using more than one thread.  The model for Flume is single threaded.  How do I support this with Flume?  multiplex over n channels with a sink on each ?
> > >
> > >
> >
> >
>