|
|
+
Jagadish Bihani 2012-10-09, 07:46
+
Brock Noland 2012-10-09, 14:31
-
Re: Flume throughput correlation with RAMJagadish Bihani 2012-10-10, 10:11
Hi
Thanks for the inputs Brock. After doing several experiments eventually problem boiled down to disks. -- But I had used the same configuration (so all software components are same in all 3 machines) on all 3 machines. -- In User guide it is written that if multiple file channel instances are active on the same agent then different disks are preferable. But in my case *only one file channel is active per agent.* -- Only one pattern I observed that on the machines where I got better performance have multiple disks. But I don't understand how that will help if I have only 1 active file channel. -- What is the impact of the type of disk/disk device driver on performance? I mean I don't understand with 1 disk I am getting 40 KB/sec and with other 2 MB/sec. Could you please elaborate on File channel and disks correlation. Regards, Jagadish On 10/09/2012 08:01 PM, Brock Noland wrote: > Hi, > > Using file channel, in terms of performance, the number and type of > disks is going to be much more predictive of performance than CPU or > RAM. Note that consumer level drives/controllers will give you much > "better" performance because they lie to you about when your data is > actually written to the drive. If you search for "fsync lies" you'll > find more information on this. > > You probably want to increase the batch size to get better performance. > > Brock > > On Tue, Oct 9, 2012 at 2:46 AM, Jagadish Bihani > <[EMAIL PROTECTED]> wrote: >> Hi >> >> My flume setup is: >> >> Source Agent : cat source - File Channel - Avro Sink >> Dest Agent : avro source - File Channel - HDFS Sink. >> >> There is only 1 source agent and 1 destination agent. >> >> I measure throughput as amount of data written to HDFS per second. >> ( I have rolling interval 30 sec; so If 60 MB file is generated in 30 sec >> the >> throughput is : -- 2 MB/sec ). >> >> I have run source agent on various machines with different hardware >> configurations : >> (In all cases I run flume agent with JAVA OPTIONS as >> "-DJAVA_OPTS="-Xms500m -Xmx1g -Dcom.sun.management.jmxremote >> -XX:MaxDirectMemorySize=2g") >> >> JDK is 32 bit. >> >> Experiment 1: >> ====>> RAM : 16 GB >> Processor: Intel Xeon E5620 @ 2.40 GHz (16 cores). >> 64 bit Processor with 64 bit Kernel. >> Throughput: 2 MB/sec >> >> Experiment 2: >> =====>> RAM : 4 GB >> Processor: Intel Xeon E5504 @ 2.00GHz (4 cores). 32 bit Processor >> 64 bit Processor with 32 bit Kernel. >> Throughput : 30 KB/sec >> >> Experiment 3: >> =====>> RAM : 8 GB >> Processor:Intel Xeon E5520 @ 2.27 GHz (16 cores).32 bit Processor >> 64 bit Processor with 32 bit Kernel. >> Throughput : 80 KB/sec >> >> -- So as can be seen there is huge difference in the throughput with same >> configuration but >> different hardware. >> -- In the first case where throughput is more RES is around 160 MB in other >> cases it is in >> the range of 40 MB - 50 MB. >> >> Can anybody please give insights that why there is this huge difference in >> the throughput? >> What is the correlation between RAM and filechannel/HDFS sink performance >> and also >> with 32-bit/64 bit kernel? >> >> Regards, >> Jagadish > +
Brock Noland 2012-10-10, 15:54
+
Jagadish Bihani 2012-10-10, 16:00
+
Brock Noland 2012-10-10, 16:05
+
Jagadish Bihani 2012-10-10, 16:22
+
Brock Noland 2012-10-10, 18:00
|