Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Re: HBase Sink Reliability


Copy link to this message
-
Re: HBase Sink Reliability
Hi Dave,

You are on the right track with thoughts here.
The best way to ensure all events are successfully delivered to Hbase as
well would be to use a separate channel for the hbase sink.

-Jeff
On Mon, Apr 22, 2013 at 8:11 AM, David Quigley <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I am using flume to write events from webserver to both HDFS and HBase.
> All events are being written to HDFS but only about half are making it into
> HBase. Is there anything in my configurations which would be causing the
> issue? I have both HDFS and HBase sink reading from the same File Channel.
> Is it better to have one channel per sink?
>
> Thanks,
> Dave
>
>
> # flume config on web server
> agent.sources = sourceLog
> agent.sources.sourceLog.type = exec
> agent.sources.sourceLog.command = tail -F /var/log/clickServer/clicks_out
> agent.sources.sourceLog.batchSize = 100
> agent.sources.sourceLog.channels = fileChannel
>
> agent.sources.sourceLog.interceptors = itime ihost idatatype idataparent
> agent.sources.sourceLog.interceptors.itime.type = timestamp
> agent.sources.sourceLog.interceptors.ihost.type = host
> agent.sources.sourceLog.interceptors.ihost.useIP = false
> agent.sources.sourceLog.interceptors.ihost.hostHeader = host
> agent.sources.sourceLog.interceptors.idatatype.type = static
> agent.sources.sourceLog.interceptors.idatatype.key = data_type
> agent.sources.sourceLog.interceptors.idatatype.value = clicks
> agent.sources.sourceLog.interceptors.idataparent.type = static
> agent.sources.sourceLog.interceptors.idataparent.key = data_parent
> agent.sources.sourceLog.interceptors.idataparent.value = *
>
> agent.channels = fileChannel
> agent.channels.fileChannel.type = file
> agent.channels.fileChannel.transactionCapacity = 100
> agent.channels.fileChannel.checkpointDir > /opt/flume/file-channel/checkpoint
> agent.channels.fileChannel.dataDirs = /opt/flume/file-channel/data
>
> agent.sinks = AvroSink_main AvroSink_backup_1 AvroSink_backup_2
> AvroSink_backup_3
> agent.sinks.AvroSink_main.type = avro
> agent.sinks.AvroSink_main.channel = fileChannel
> agent.sinks.AvroSink_main.hostname = *
> agent.sinks.AvroSink_main.port = 35873
> agent.sinks.AvroSink_main.batchSize = 100
> agent.sinks.AvroSink_backup_1.type = avro
> agent.sinks.AvroSink_backup_1.channel = fileChannel
> agent.sinks.AvroSink_backup_1.hostname = *
> agent.sinks.AvroSink_backup_1.port = 35873
> agent.sinks.AvroSink_backup_1.batchSize = 100
> agent.sinks.AvroSink_backup_2.type = avro
> agent.sinks.AvroSink_backup_2.channel = fileChannel
> agent.sinks.AvroSink_backup_2.hostname = *
> agent.sinks.AvroSink_backup_2.port = 35873
>  agent.sinks.AvroSink_backup_2.batchSize = 100
> agent.sinks.AvroSink_backup_3.type = avro
> agent.sinks.AvroSink_backup_3.channel = fileChannel
> agent.sinks.AvroSink_backup_3.hostname = *
> agent.sinks.AvroSink_backup_3.port = 35873
> agent.sinks.AvroSink_backup_3.batchSize = 100
> agent.sinkgroups = failover
> agent.sinkgroups.failover.sinks = AvroSink_main AvroSink_backup_1
> AvroSink_backup_2 AvroSink_backup_3
> agent.sinkgroups.failover.processor.type = failover
> agent.sinkgroups.failover.processor.priority.AvroSink_main = 10
> agent.sinkgroups.failover.processor.priority.AvroSink_backup_1 = 5
> agent.sinkgroups.failover.processor.priority.AvroSink_backup_2 = 3
> agent.sinkgroups.failover.processor.priority.AvroSink_backup_3 = 1
> agent.sinkgroups.failover.processor.maxpenalty = 10000
>
>
>
> # flume config on hadoop cluster
>
> collector.sources=AvroIn
>
> collector.sources.AvroIn.type=avro
>
> collector.sources.AvroIn.bind=0.0.0.0
>
> collector.sources.AvroIn.port=35873
>
> collector.sources.AvroIn.channels=fileChannel
>
>
> collector.channels=fileChannel
>
> collector.channels.fileChannel.type=FILE
>
> collector.channels.fileChannel.capacity=1000
>
>
> collector.channels.fileChannel.checkpointDir=~/.flume/file-channel/checkpoint_%{data_type}
>
>
> collector.channels.fileChannel.dataDirs=~/.flume/file-channel/data_%{data_type}
>
> collector.sinks=hbaseSink hdfsSink