yes. to avoid them clobbering each other's writes.
On Tue, Nov 5, 2013 at 4:34 AM, Bojan Kostić <[EMAIL PROTECTED]> wrote:
> Sorry for late response. But I lost this email somehow.
> Thanks for the read, it is nice start even it is old.
> And the numbers are really promising.
> I'm testing memory chanel, there is like 20 data sources(log servers) with
> 60 different files each.
> My RPC client app is basic like in examples. But it have load balancing
> for two flume agents which are writing data to hdfs.
> I think I read somewhere that you should have one sink per file. Is that
> Best regards, and sorry again for late response.
> On Oct 22, 2013 8:50 AM, "Juhani Connolly" <
> [EMAIL PROTECTED]> wrote:
>> Hi Bojan,
>> This is pretty old, but Mike did some testing on performance about an
>> year and a half ago:
>> He was getting a max of 70k events/sec on a single machine.
>> Thing is, this is a result of a huge number of variables:
>> - Parallelization of flows allows better parallel processing
>> - Use of memory channel as opposed to a slower consistent channel.
>> - Possibly the source. I have no idea how you wrote your app
>> - Batching of events is important. Also are all events written to one
>> file? Or are they split over many? Every file is separately processed.
>> - Network congestion, your hdfs setup
>> Reaching 100k events per second is definitely possible. The resources you
>> need for it will vary significantly depending on how your setup is. The
>> more HA type features you use, the slower delivery is likely to become. On
>> the flipside, allowing fairly lax conditions that have a small potential
>> for data loss(on crash for example memory channel contents are gone) will
>> allow for close to 100k even on a single machine.
>> On 10/14/2013 09:00 PM, Bojan Kostić wrote:
>>> Hi, this is my first post here. But i play with flume for some time now.
>>> My question is how well flume scale?
>>> Can Flume ingest +100k events per seccond? Has anyone tried something
>>> like this?
>>> I created simple test and results are really slow.
>>> I wrote simple app with rpc client with fallback using flume sdk which
>>> is reading dummy log file.
>>> In the end i have two flume agents which are writing to hdfs.
>>> rollInterval = 60
>>> And in hdfs i get files with ~12MB.
>>> Do i need to use some complex topology with 3 tier?
>>> How many flume agents should write to hdfs?
>>> Best regards.
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.