Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Throughput of HDFSSink


Copy link to this message
-
Re: Throughput of HDFSSink
Hi,

For performance testing I
highly recommend org.apache.flume.source.StressSource

Perhaps try that?

Brock

On Thu, Nov 8, 2012 at 7:43 PM, Pankaj Gupta <[EMAIL PROTECTED]> wrote:

> Hi,
>
> What is the throughput I can expect when writing to the HDFS Sink. Here is
> the flume config I'm using:
>
> # in this case called 'agent'
>
> # Define a memory channel called ch1 on agent1
> agent1.channels.ch1.type = memory
>
> # Define an exec source called exec-source1 on agent1 and tell it
> # to bind to 0.0.0.0:41414. Connect it to channel ch1.
> agent1.sources.exec-source1.channels = ch1
> agent1.sources.exec-source1.type = exec
> agent1.sources.exec-source1.restart = true
> agent1.sources.exec-source1.batchSize = 100
> agent1.sources.exec-source1.command = /home/ubuntu/flume/linesource.sh
>
> # Define a logger sink that simply logs all events it receives
> # and connect it to the other end of the same channel.
> agent1.sinks.hdfs-sink1.channel = ch1
> agent1.sinks.hdfs-sink1.type = hdfs
> agent1.sinks.hdfs-sink1.hdfs.path > hdfs://ip-10-000-000-000.ec2.internal/user/ubuntu/event
> agent1.sinks.hdfs-sink1.hdfs.filePrefix = event
> agent1.sinks.hdfs-sink1.hdfs.writeFormat = Text
> agent1.sinks.hdfs-sink1.hdfs.rollInterval = 60
> agent1.sinks.hdfs-sink1.hdfs.rollCount = 0
> agent1.sinks.hdfs-sink1.hdfs.rollSize = 0
> agent1.sinks.hdfs-sink1.hdfs.fileType = DataStream
> agent1.sinks.hdfs-sink1.hdfs.batchSize = 1000
>
> # Finally, now that we've defined all of our components, tell
> # agent1 which ones we want to activate.
> agent1.channels = ch1
> agent1.sources = exec-source1
> agent1.sinks = hdfs-sink1
>
>
> So far I only get about 20Mb/min or less than 1 Mb/sec. I am wondering how
> far it can be improved. Is there any Benchmark on HDFS Sink performance.
>
> Thanks in Advance,
> Pankaj
>
>
>
--
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB