Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - Flume throughput correlation with RAM


+
Jagadish Bihani 2012-10-09, 07:46
+
Brock Noland 2012-10-09, 14:31
+
Jagadish Bihani 2012-10-10, 10:11
+
Brock Noland 2012-10-10, 15:54
Copy link to this message
-
Re: Flume throughput correlation with RAM
Jagadish Bihani 2012-10-10, 16:00
Hi

Yes. It is around 480 - 500 bytes.

On 10/10/2012 09:24 PM, Brock Noland wrote:
> How big are your events? Average about 400 bytes?
>
> Brock
>
> On Wed, Oct 10, 2012 at 5:11 AM, Jagadish Bihani
> <[EMAIL PROTECTED]> wrote:
>> Hi
>>
>> Thanks for the inputs Brock. After doing several experiments
>> eventually problem boiled down to disks.
>>
>>   -- But I had used the same configuration (so all software components are
>> same in all 3 machines)
>> on all 3 machines.
>> -- In User guide it is written that if multiple file channel instances are
>> active on the same agent then
>> different disks are preferable. But in my case only one file channel is
>> active per agent.
>> -- Only one pattern I observed that on the machines where I got better
>> performance have multiple disks.
>> But I don't understand how that will help if I have only 1 active file
>> channel.
>> -- What is the impact of the type of disk/disk device driver on performance?
>> I mean I don't understand
>> with 1 disk I am getting 40 KB/sec and with other 2 MB/sec.
>>
>> Could you please elaborate on File channel and disks correlation.
>>
>> Regards,
>> Jagadish
>>
>>
>> On 10/09/2012 08:01 PM, Brock Noland wrote:
>>
>> Hi,
>>
>> Using file channel, in terms of performance, the number and type of
>> disks is going to be much more predictive of performance than CPU or
>> RAM. Note that consumer level drives/controllers will give you much
>> "better" performance because they lie to you about when your data is
>> actually written to the drive. If you search for "fsync lies" you'll
>> find more information on this.
>>
>> You probably want to increase the batch size to get better performance.
>>
>> Brock
>>
>> On Tue, Oct 9, 2012 at 2:46 AM, Jagadish Bihani
>> <[EMAIL PROTECTED]> wrote:
>>
>> Hi
>>
>> My flume setup is:
>>
>> Source Agent : cat source - File Channel - Avro Sink
>> Dest Agent :     avro source - File Channel - HDFS Sink.
>>
>> There is only 1 source agent and 1 destination agent.
>>
>> I measure throughput as amount of data written to HDFS per second.
>> ( I have rolling interval 30 sec; so If 60 MB file is generated in 30 sec
>> the
>> throughput is : -- 2 MB/sec ).
>>
>> I have run source agent on various machines with different hardware
>> configurations :
>> (In all cases I run flume agent with JAVA OPTIONS as
>> "-DJAVA_OPTS="-Xms500m -Xmx1g -Dcom.sun.management.jmxremote
>> -XX:MaxDirectMemorySize=2g")
>>
>> JDK is 32 bit.
>>
>> Experiment 1:
>> ====>> RAM : 16 GB
>> Processor: Intel Xeon E5620 @ 2.40 GHz (16 cores).
>> 64 bit Processor with 64 bit Kernel.
>> Throughput: 2 MB/sec
>>
>> Experiment 2:
>> =====>> RAM : 4 GB
>> Processor: Intel Xeon E5504  @ 2.00GHz (4 cores). 32 bit Processor
>> 64 bit Processor with 32 bit Kernel.
>> Throughput : 30 KB/sec
>>
>> Experiment 3:
>> =====>> RAM : 8 GB
>> Processor:Intel Xeon E5520 @ 2.27 GHz (16 cores).32 bit Processor
>> 64 bit Processor with 32 bit Kernel.
>> Throughput : 80 KB/sec
>>
>>   -- So as can be seen there is huge difference in the throughput with same
>> configuration but
>> different hardware.
>> -- In the first case where throughput is more RES is around 160 MB in other
>> cases it is in
>> the range of 40 MB - 50 MB.
>>
>> Can anybody please give insights that why there is this huge difference in
>> the throughput?
>> What is the correlation between RAM and filechannel/HDFS sink performance
>> and also
>> with 32-bit/64 bit kernel?
>>
>> Regards,
>> Jagadish
>>
>>
>>
>
>
+
Brock Noland 2012-10-10, 16:05
+
Jagadish Bihani 2012-10-10, 16:22
+
Brock Noland 2012-10-10, 18:00