Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Flume throughput correlation with RAM


+
Jagadish Bihani 2012-10-09, 07:46
+
Brock Noland 2012-10-09, 14:31
Copy link to this message
-
Re: Flume throughput correlation with RAM
Hi

Thanks for the inputs Brock. After doing several experiments
eventually problem boiled down to disks.

  -- But I had used the same configuration (so all software components
are same in all 3 machines)
on all 3 machines.
-- In User guide it is written that if multiple file channel instances
are active on the same agent then
different disks are preferable. But in my case *only one file channel is
active per agent.*
-- Only one pattern I observed that on the machines where I got better
performance have multiple disks.
But I don't understand how that will help if I have only 1 active file
channel.
-- What is the impact of the type of disk/disk device driver on
performance? I mean I don't understand
with 1 disk I am getting 40 KB/sec and with other 2 MB/sec.

Could you please elaborate on File channel and disks correlation.

Regards,
Jagadish

On 10/09/2012 08:01 PM, Brock Noland wrote:
> Hi,
>
> Using file channel, in terms of performance, the number and type of
> disks is going to be much more predictive of performance than CPU or
> RAM. Note that consumer level drives/controllers will give you much
> "better" performance because they lie to you about when your data is
> actually written to the drive. If you search for "fsync lies" you'll
> find more information on this.
>
> You probably want to increase the batch size to get better performance.
>
> Brock
>
> On Tue, Oct 9, 2012 at 2:46 AM, Jagadish Bihani
> <[EMAIL PROTECTED]>  wrote:
>> Hi
>>
>> My flume setup is:
>>
>> Source Agent : cat source - File Channel - Avro Sink
>> Dest Agent :     avro source - File Channel - HDFS Sink.
>>
>> There is only 1 source agent and 1 destination agent.
>>
>> I measure throughput as amount of data written to HDFS per second.
>> ( I have rolling interval 30 sec; so If 60 MB file is generated in 30 sec
>> the
>> throughput is : -- 2 MB/sec ).
>>
>> I have run source agent on various machines with different hardware
>> configurations :
>> (In all cases I run flume agent with JAVA OPTIONS as
>> "-DJAVA_OPTS="-Xms500m -Xmx1g -Dcom.sun.management.jmxremote
>> -XX:MaxDirectMemorySize=2g")
>>
>> JDK is 32 bit.
>>
>> Experiment 1:
>> ====>> RAM : 16 GB
>> Processor: Intel Xeon E5620 @ 2.40 GHz (16 cores).
>> 64 bit Processor with 64 bit Kernel.
>> Throughput: 2 MB/sec
>>
>> Experiment 2:
>> =====>> RAM : 4 GB
>> Processor: Intel Xeon E5504  @ 2.00GHz (4 cores). 32 bit Processor
>> 64 bit Processor with 32 bit Kernel.
>> Throughput : 30 KB/sec
>>
>> Experiment 3:
>> =====>> RAM : 8 GB
>> Processor:Intel Xeon E5520 @ 2.27 GHz (16 cores).32 bit Processor
>> 64 bit Processor with 32 bit Kernel.
>> Throughput : 80 KB/sec
>>
>>   -- So as can be seen there is huge difference in the throughput with same
>> configuration but
>> different hardware.
>> -- In the first case where throughput is more RES is around 160 MB in other
>> cases it is in
>> the range of 40 MB - 50 MB.
>>
>> Can anybody please give insights that why there is this huge difference in
>> the throughput?
>> What is the correlation between RAM and filechannel/HDFS sink performance
>> and also
>> with 32-bit/64 bit kernel?
>>
>> Regards,
>> Jagadish
>
+
Brock Noland 2012-10-10, 15:54
+
Jagadish Bihani 2012-10-10, 16:00
+
Brock Noland 2012-10-10, 16:05
+
Jagadish Bihani 2012-10-10, 16:22
+
Brock Noland 2012-10-10, 18:00
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB