Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - File Sink/Source

Copy link to this message
Re: File Sink/Source
Jeff Lord 2013-10-08, 02:16
Yes the file channel is designed to handle this and is what you should be
You are also on the right track regarding sizing your file channel to
account for the number of events that could accumulate in the event that
your terminal sink is unable to complete transactions. With the amount of
data that you would like to buffer it will take a file channel somewhere
around 72GB.
So some other things you should consider here are the size of your hard
drives, the drain rate of a single sink on that channel once the terminal
destination is up again, durability in the event of a drive failure and so
on. For these reasons you may decide that you want to have a few agents on
separate hosts that can help to spread the load.

Hope this is helpful.

On Mon, Oct 7, 2013 at 6:54 AM, David Sinclair <

> I am using a AMQP Souce, so I don't know how changing to a JMS source
> would have any difference.
> I am concerned about the volume of data and the file channel. Even if I
> switched to JMS, my question would be the same.
> On Fri, Oct 4, 2013 at 4:46 PM, Hari Shreedharan <
>>  Have you tried the JMS Source? It can pick up data directly into Flume.
>> Thanks,
>> Hari
>> On Friday, October 4, 2013 at 11:59 AM, David Sinclair wrote:
>> Hi,
>> I have a question regarding the RollingFileSink and
>> SpoolingDirectorySource. I was trying to write everything from an AMQP
>> source to a file sink, then have the spooling directory source pick up
>> these files. This won't work as the files aren't immutable.
>> If I use a File Channel to store the events between my source and sink,
>> is there a concern about the number of events in the channel if the sink is
>> unable to deliver said events? For example, I will be getting around 5K
>> messages/sec and the size is about 2K. So roughly 10MB a second. If the
>> sink is unable to deliver the messages for 2 hours, that would be 36
>> million events in the channel.
>> Is the file channel designed to handle this? Or should I have a file sink
>> in between.
>> thanks
>> dave