Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> File Sink/Source


Copy link to this message
-
Re: File Sink/Source
Thanks much Jeff. This is exactly what I needed to know. Much appreciated.

I have been experimenting with having multiple flows on the same agent just
writing to different disks to improve the throughput as well.
On Mon, Oct 7, 2013 at 10:16 PM, Jeff Lord <[EMAIL PROTECTED]> wrote:

> Yes the file channel is designed to handle this and is what you should be
> using.
> You are also on the right track regarding sizing your file channel to
> account for the number of events that could accumulate in the event that
> your terminal sink is unable to complete transactions. With the amount of
> data that you would like to buffer it will take a file channel somewhere
> around 72GB.
> So some other things you should consider here are the size of your hard
> drives, the drain rate of a single sink on that channel once the terminal
> destination is up again, durability in the event of a drive failure and so
> on. For these reasons you may decide that you want to have a few agents on
> separate hosts that can help to spread the load.
>
> Hope this is helpful.
>
> -Jeff
>
>
> On Mon, Oct 7, 2013 at 6:54 AM, David Sinclair <
> [EMAIL PROTECTED]> wrote:
>
>> I am using a AMQP Souce, so I don't know how changing to a JMS source
>> would have any difference.
>>
>> I am concerned about the volume of data and the file channel. Even if I
>> switched to JMS, my question would be the same.
>>
>>
>> On Fri, Oct 4, 2013 at 4:46 PM, Hari Shreedharan <
>> [EMAIL PROTECTED]> wrote:
>>
>>>  Have you tried the JMS Source? It can pick up data directly into Flume.
>>>
>>>
>>> Thanks,
>>> Hari
>>>
>>> On Friday, October 4, 2013 at 11:59 AM, David Sinclair wrote:
>>>
>>> Hi,
>>>
>>> I have a question regarding the RollingFileSink and
>>> SpoolingDirectorySource. I was trying to write everything from an AMQP
>>> source to a file sink, then have the spooling directory source pick up
>>> these files. This won't work as the files aren't immutable.
>>>
>>> If I use a File Channel to store the events between my source and sink,
>>> is there a concern about the number of events in the channel if the sink is
>>> unable to deliver said events? For example, I will be getting around 5K
>>> messages/sec and the size is about 2K. So roughly 10MB a second. If the
>>> sink is unable to deliver the messages for 2 hours, that would be 36
>>> million events in the channel.
>>>
>>> Is the file channel designed to handle this? Or should I have a file
>>> sink in between.
>>>
>>> thanks
>>>
>>> dave
>>>
>>>
>>>
>>
>