Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Restarts without data loss


Copy link to this message
-
Re: Restarts without data loss
Hari: you mean multiple disks, not multiple folders? Running off a
single disk the performance is unfortunately not "reasonably good".

The reality of most companies hoping to aggregate logs is that a lot of
machines generating the logs have a single set of raided disks, and that
using multiple disks is not an option. Please do keep this in mind when
running tests and not just the "best case scenario". After all, flume is
going to be co-habiting on a server that was made for the primary task
in mind. The servers are built for their primary purposes, not for flume.

In our case what we had hoped to do on our log sources, and currently
are doing with scribed(which has its own issues, hence wanting to move):

- Run agents on all our log generating servers, using a channel that can
retain data in case of network issues communicating with the collector
layer.
  - Current setup is a scribed buffer store with network store as
primary, file as secondary.
  - Intended setup with flume was a file channel connected to an avro
sink. With only a single disk available, it is extremely slow. JDBC
channel is also extremely slow, and MemoryChannel will fill up and start
refusing puts as soon as a network issue comes up.

I think this is a very common use case and one that is likely holding up
adoption until we solve it(at least is is for us).

On 07/09/2012 04:07 PM, Hari Shreedharan wrote:
> Senthil,
>
> Have you tried using it recently, with multiple data folders etc. In
> recent tests, we have seen reasonably good performance. Of course, the
> performance of MemoryChannel would be much better, since it is
> in-memory :-). You should try to use the FileChannel as much as you
> can, else there is a risk of losing data.
>
> Thanks
> Hari
>
> --
> Hari Shreedharan
>
> On Monday, July 9, 2012 at 12:01 AM, Senthilvel Rangaswamy wrote:
>
>> We do use persistent channel when there is overflow. Using
>> FileChannel for regular operations
>> is too slow for us.
>>
>> On Sun, Jul 8, 2012 at 11:58 PM, Brock Noland <[EMAIL PROTECTED]
>> <mailto:[EMAIL PROTECTED]>> wrote:
>>> I am guessing you are aware, but you could use a persistent channel
>>> such as file channel.
>>>
>>> --
>>> Brock Noland
>>> Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
>>>
>>> On Monday, July 9, 2012 at 7:18 AM, Senthilvel Rangaswamy wrote:
>>>
>>>> We are using Flume 1.2.0 with memory channel. When we rollout new
>>>> configs/decorators
>>>> we may need to restart flume at which point any events in memory
>>>> channel is gone. Any
>>>> ways to avoid this ?
>>>>
>>>> Thanks,
>>>> --
>>>> ..Senthil
>>>>
>>>> "If there's anything more important than my ego around, I want it
>>>>  caught and shot now."
>>>>            - Douglas Adams.
>>>>
>>>
>>
>>
>>
>> --
>> ..Senthil
>>
>> "If there's anything more important than my ego around, I want it
>>  caught and shot now."
>>                                                     - Douglas Adams.
>>
>