Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - Can Flume handle +100k events per seccond?


+
Bojan Kostić 2013-10-14, 12:00
+
Juhani Connolly 2013-10-22, 06:49
+
Bojan Kostić 2013-11-05, 12:34
+
Roshan Naik 2013-11-05, 19:46
+
Bojan Kostić 2013-11-06, 00:41
+
Roshan Naik 2013-11-06, 01:02
Copy link to this message
-
Re: Can Flume handle +100k events per seccond?
Bojan Kostić 2013-11-06, 09:39
It was late when i wrote last mail, and my explanation was not clear.
I will illustrate:
20 servers, every one with 60 different log files.
I was thinking that I could have this kind of structure on hdfs:
/logs/server0/logstat0.log
/logs/server0/logstat1.log
.
.
.
/logs/server20/logstat0.log
.
.
.

But from your info I see that I can't do that.
I could try to add server id column in every file and then aggregate files
from all files servers to one file
/logs/logstat0.log
/logs/logstat1.log
.
.
.

But again I should have 60 sinks.
On Nov 6, 2013 2:02 AM, "Roshan Naik" <[EMAIL PROTECTED]> wrote:

> I assume you mean  you have 120 source files to be streamed into HDFS.
> There is not a 1-1 correspondence between source files and destination
> hdfs files.  If they are on the same host, you can have them all picked up
> through one source, once channel and one hdfs sink... winding up in a
> single hdfs file.
>
> In case you have a config with multiple HDFS sinks (part of a single agent
> or spanning multiple agents) you want to ensure each HDFS sink writes to a
> separate file in HDFS.
>
>
> On Tue, Nov 5, 2013 at 4:41 PM, Bojan Kostić <[EMAIL PROTECTED]>wrote:
>
>> Hallo Roshan,
>>
>> Thanks for response.
>> Bit I am now confused. If I have 120 files, do I need to configure 120
>> sinks/sources/channels separately? Or I have missed something in the docs.
>> Maybe I should use Fan out flow? But then again I must set 120 params.
>>
>> Best regards.
>> On Nov 5, 2013 8:47 PM, "Roshan Naik" <[EMAIL PROTECTED]> wrote:
>>
>>> yes. to avoid them clobbering each other's writes.
>>>
>>>
>>> On Tue, Nov 5, 2013 at 4:34 AM, Bojan Kostić <[EMAIL PROTECTED]>wrote:
>>>
>>>> Sorry for late response. But I lost this email somehow.
>>>>
>>>> Thanks for the read, it is nice start even it is old.
>>>> And the numbers are really promising.
>>>>
>>>> I'm testing memory chanel, there is like 20 data sources(log servers)
>>>> with 60 different files each.
>>>>
>>>> My RPC client app is basic like in examples. But it have load balancing
>>>> for two flume agents which are writing data to hdfs.
>>>>
>>>> I think I read somewhere that you should have one sink per file. Is
>>>> that true?
>>>>
>>>> Best regards, and sorry again for late response.
>>>>  On Oct 22, 2013 8:50 AM, "Juhani Connolly" <
>>>> [EMAIL PROTECTED]> wrote:
>>>>
>>>>> Hi Bojan,
>>>>>
>>>>> This is pretty old, but Mike did some testing on performance about an
>>>>> year and a half ago:
>>>>>
>>>>> https://cwiki.apache.org/confluence/display/FLUME/
>>>>> Flume+NG+Syslog+Performance+Test+2012-04-30
>>>>>
>>>>> He was getting a max of 70k events/sec on a single machine.
>>>>>
>>>>> Thing is, this is a result of a huge number of variables:
>>>>> - Parallelization of flows allows better parallel processing
>>>>> - Use of memory channel as opposed to a slower consistent channel.
>>>>> - Possibly the source. I have no idea how you wrote your app
>>>>> - Batching of events is important. Also are all events written to one
>>>>> file? Or are they split over many? Every file is separately processed.
>>>>> - Network congestion, your hdfs setup
>>>>>
>>>>> Reaching 100k events per second is definitely possible. The resources
>>>>> you need for it will vary significantly depending on how your setup is. The
>>>>> more HA type features you use, the slower delivery is likely to become. On
>>>>> the flipside, allowing fairly lax conditions that have a small potential
>>>>> for data loss(on crash for example memory channel contents are gone) will
>>>>> allow for close to 100k even on a single machine.
>>>>>
>>>>> On 10/14/2013 09:00 PM, Bojan Kostić wrote:
>>>>>
>>>>>> Hi, this is my first post here. But i play with flume for some time
>>>>>> now.
>>>>>> My question is how well flume scale?
>>>>>> Can Flume ingest +100k events per seccond? Has anyone tried something
>>>>>> like this?
>>>>>>
>>>>>> I created simple test and results are really slow.
>>>>>> I wrote simple app with rpc client with fallback using flume sdk
+
Roshan Naik 2013-11-06, 19:42
+
Juhani Connolly 2013-11-18, 08:50