Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - Flume Source and Sink in different hosts


+
Kumar, Suresh 2012-10-04, 21:53
+
Hari Shreedharan 2012-10-04, 22:02
+
Kumar, Suresh 2012-10-04, 22:19
+
Hari Shreedharan 2012-10-04, 22:40
+
Kumar, Suresh 2012-10-04, 22:46
+
Hari Shreedharan 2012-10-04, 23:25
+
Kumar, Suresh 2012-10-05, 18:07
+
Kumar, Suresh 2012-10-05, 18:27
Copy link to this message
-
Re: Flume Source and Sink in different hosts
Hari Shreedharan 2012-10-05, 18:40
Ah, it seems like this is because your file is growing not "too fast." The exec source does do some "batching" by waiting for around 20 lines to come in before writing it out to the channel. This is important to not hit performance of channels like File Channel. Can you add this to your source config:  
batchSize = 1
If you set batch size to 1, I would not recommend using File Channel - because there will be far too many IO ops to give good performance. You should use Memory Channel - of course, the data will not survive a program or system crash. If you want to use File Channel, I'd suggest with batchSize of 100 or so.
Thanks,
Hari

--  
Hari Shreedharan
On Friday, October 5, 2012 at 11:27 AM, Kumar, Suresh wrote:

> Just a quick update, it is definitely a source issue and nothing to do with flume configuration in the sink.
>  
> I restarted the sink, I do not see the data in HBase, however if I stop the agent in source, I do not see
> any data, but as soon as I start the agent in source, I see the data in my HBase which is in HostB.
>  
> Thanks for any help,
> Suresh
>  
>  
> From: Kumar, Suresh [mailto:[EMAIL PROTECTED]]  
> Sent: Friday, October 05, 2012 11:08 AM
> To: [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])
> Subject: RE: Flume Source and Sink in different hosts
>  
> I increased the heap size in source and sink to 1G, I now use the AsyncHBaseSink in my sink agent configuration, it didn’t make
> that much of a difference.
>  
> I changed my source agent configuration from memory to file in HostA, I did not change my sink agent configuration in HostB
> (it is still set to Memory Channel). I still see the latency issue (BTW, the auth.log grows every second). However I noticed
> that if I kill the agent in HostA (source) and restart, I see entries in HBase. Am I missing something? How often does the data
> get flushed from source to sink? Should sink also be the same channel type (file)?
>  
> Here is my conf and log for HostA (source)
>  
> flume.conf (source)
>  
> agent3.sources = tail
> agent3.channels = FileChannel-1
> agent3.sinks = avro-sink
>  
> # Define source flow
> agent3.sources.tail.type = exec
> agent3.sources.tail.command = tail -F /var/log/auth.log
> agent3.sources.tail.channels = FileChannel-1
>  
> # What kind of channel
> agent3.channels.FileChannel-1.type = file
> agent3.channels.FileChannel-1.checkpointDir = /tmp/checkpoint
> agent3.channels.FileChannel-1.dataDirs = /tmp/data
>  
> # avro sink properties
> agent3.sinks.avro-sink.type = avro
> agent3.sinks.avro-sink.channel = FileChannel-1
> agent3.sinks.avro-sink.hostname = sig-flume
> agent3.sinks.avro-sink.port = 41414
>  
>  
> Log (source)
>  
>  
> 2012-10-05 10:49:03,736 (main) [INFO - org.apache.flume.node.FlumeNode.start(FlumeNode.java:54)] Flume node starting - agent3
> 2012-10-05 10:49:03,752 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:203)] Node manager starting
> 2012-10-05 10:49:03,752 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:67)] Configuration provider starting
> 2012-10-05 10:49:03,760 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java:67)] Starting lifecycle supervisor 12
> 2012-10-05 10:49:03,763 (lifecycleSupervisor-1-1) [DEBUG - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:207)] Node manager started
> 2012-10-05 10:49:03,767 (lifecycleSupervisor-1-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:86)] Configuration provider started
> 2012-10-05 10:49:03,769 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:188)] Checking file:conf/flume.conf for changes
+
Kumar, Suresh 2012-10-05, 21:55