Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> HBaseSink is very slow


+
Kumar, Deepak8 2013-07-29, 17:55
Copy link to this message
-
Re: HBaseSink is very slow
Hi Deepak,

1. When using the load balancing sink group the list of sinks will be
processed serially as opposed to in parallel.

2. The batch size on your source is very small.
agent.sources.1374869469492.batchSize = 1
You may try increasing that for better throughput.

3. The AsyncHbaseSink is going to be more performant than the HbaseSink.

4. Would also recommend using the spoolDirSource instead of exec and
tailing a file.

-Jeff

On Mon, Jul 29, 2013 at 10:55 AM, Kumar, Deepak8 <[EMAIL PROTECTED]> wrote:
> Hi,
> Could you please guide me the optimum number of log events the HBaseSink can process in a second. Currently my application is generating 5000 log events/second but there is much backlog & it seems HBaseSink is not processing even 300 log events/second.
>
> I have configured a sinkgroup with 5 hbase sink in load_balancing Faison. Would all the 5 hbase sinks in sinkgroup executes in parallel?
>
> Here is my flume-conf.properties file:
>
> agent.sources =     source1 1374869469492 1374947746264 1374947757841 1374947770965 1374948166450 1374948182966 1374948198075 1374948216652 1374948231355 1374948246308 1374948260698
> agent.channels = fileChannel
> agent.sinks = hbaseSink1 hbaseSink2 hbaseSink3 hbaseSink4 hbaseSink5
> agent.sinkgroups = sinkGroup1
> agent.sinkgroups.sinkGroup1.sinks = hbaseSink1 hbaseSink2 hbaseSink3 hbaseSink4 hbaseSink5
> agent.sinkgroups.sinkGroup1.processor.type = load_balance
>
> # Channel's type is defined.
> agent.channels.fileChannel.type = file
> agent.channels.fileChannel.checkpointDir = /var/log/flume-ng/file-channel/checkpoint
> agent.channels.fileChannel.dataDirs = /var/log/flume-ng/file-channel/data
> agent.channels.fileChannel.transactionCapacity = 1000
> agent.channels.fileChannel.checkpointInterval = 30000
> agent.channels.fileChannel.maxFileSize = 2146435071
> agent.channels.fileChannel.minimumRequiredSpace = 524288000
> agent.channels.fileChannel.keep-alive = 5
> #agent.channels.fileChannel.write-timeout = 10
> agent.channels.fileChannel.write-timeout = 50
> agent.channels.fileChannel.checkpoint-timeout = 600
> agent.channels.fileChannel.capacity = 50000000
>
> #HBaseSink1
> agent.sinks.hbaseSink1.type = org.apache.flume.sink.hbase.HBaseSink
> agent.sinks.hbaseSink1.table=elf_log
> agent.sinks.hbaseSink1.columnFamily=content
> agent.sinks.hbaseSink1.serializer=com.citi.sponge.flume.collector.sink.LogHbaseEventSerializer
> agent.sinks.hbaseSink1.batchSize=200
> agent.sinks.hbaseSink1.channel = fileChannel
>
> #HBaseSink2
> agent.sinks.hbaseSink2.type = org.apache.flume.sink.hbase.HBaseSink
> agent.sinks.hbaseSink2.table=elf_log
> agent.sinks.hbaseSink2.columnFamily=content
> agent.sinks.hbaseSink2.serializer=com.citi.sponge.flume.collector.sink.LogHbaseEventSerializer
> agent.sinks.hbaseSink2.batchSize=200
> agent.sinks.hbaseSink2.channel = fileChannel
>
> #HBaseSink3
> agent.sinks.hbaseSink3.type = org.apache.flume.sink.hbase.HBaseSink
> agent.sinks.hbaseSink3.table=elf_log
> agent.sinks.hbaseSink3.columnFamily=content
> agent.sinks.hbaseSink3.serializer=com.citi.sponge.flume.collector.sink.LogHbaseEventSerializer
> agent.sinks.hbaseSink3.batchSize=200
> agent.sinks.hbaseSink3.channel = fileChannel
>
> #HBaseSink4
> agent.sinks.hbaseSink4.type = org.apache.flume.sink.hbase.HBaseSink
> agent.sinks.hbaseSink4.table=elf_log
> agent.sinks.hbaseSink4.columnFamily=content
> agent.sinks.hbaseSink4.serializer=com.citi.sponge.flume.collector.sink.LogHbaseEventSerializer
> agent.sinks.hbaseSink4.batchSize=200
> agent.sinks.hbaseSink4.channel = fileChannel
>
> #HBaseSink5
> agent.sinks.hbaseSink5.type = org.apache.flume.sink.hbase.HBaseSink
> agent.sinks.hbaseSink5.table=elf_log
> agent.sinks.hbaseSink5.columnFamily=content
> agent.sinks.hbaseSink5.serializer=com.citi.sponge.flume.collector.sink.LogHbaseEventSerializer
> agent.sinks.hbaseSink5.batchSize=200
> agent.sinks.hbaseSink5.channel = fileChannel
>
>
> agent.sources.1374869469492.batchSize = 1
> agent.sources.1374869469492.channels = fileChannel
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB