Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Flume throughput correlation with RAM

Copy link to this message
Flume throughput correlation with RAM
Jagadish Bihani 2012-10-09, 07:46

My flume setup is:

Source Agent : cat source - File Channel - Avro Sink
Dest Agent :     avro source - File Channel - HDFS Sink.

There is only 1 source agent and 1 destination agent.

I measure throughput as amount of data written to HDFS per second.
( I have rolling interval 30 sec; so If 60 MB file is generated in 30
sec the
throughput is : -- 2 MB/sec ).

I have run *source agent on various machines *with different hardware
configurations :
(In all cases I run flume agent with JAVA OPTIONS as
"-DJAVA_OPTS="-Xms500m -Xmx1g -Dcom.sun.management.jmxremote

JDK is 32 bit.

Experiment 1:
====RAM : 16 GB
Processor: Intel Xeon E5620 @ 2.40 GHz (16 cores).
64 bit Processor with 64 bit Kernel.
Throughput: 2 MB/sec

Experiment 2:
=====RAM : 4 GB
Processor: Intel Xeon E5504  @ 2.00GHz (4 cores). 32 bit Processor
64 bit Processor with 32 bit Kernel.
Throughput : 30 KB/sec

Experiment 3:
=====RAM : 8 GB
Processor:Intel Xeon E5520 @ 2.27 GHz (16 cores).32 bit Processor
64 bit Processor with 32 bit Kernel.
Throughput : 80 KB/sec

  -- So as can be seen there is huge difference in the throughput with
same configuration but
different hardware.
-- In the first case where throughput is more RES is around 160 MB in
other cases it is in
the range of 40 MB - 50 MB.

Can anybody please give insights that why there is this huge difference
in the throughput?
What is the correlation between RAM and filechannel/HDFS sink
performance and also
with 32-bit/64 bit kernel?