|
Kumar, Suresh
2012-10-04, 21:53
Hari Shreedharan
2012-10-04, 22:02
Kumar, Suresh
2012-10-04, 22:19
Hari Shreedharan
2012-10-04, 22:40
Kumar, Suresh
2012-10-04, 22:46
Hari Shreedharan
2012-10-04, 23:25
Kumar, Suresh
2012-10-05, 18:07
Kumar, Suresh
2012-10-05, 18:27
Hari Shreedharan
2012-10-05, 18:40
Kumar, Suresh
2012-10-05, 21:55
|
-
Flume Source and Sink in different hostsKumar, Suresh 2012-10-04, 21:53
Hello:
I have just downloaded and build flume-ng (apache-flume-1.3.0-SNAPSHOT). My goal is to collect log data from HostA (source) and send it to HostB(sink), my initial test (sending /etc/passwd) from HostA to HostB worked fine, I was also able to load the passwd file into my HBase in HostB. Now, I want to load a continuous stream of log data (using tail -f), but I was not able to replicate the above process. Flume just started fine in HostA, but I do not see any data being received by HostB or in my HBase. What is wrong with my configuration? Thanks, Suresh Here is my flume.conf in HostA agent3.sources = tail agent3.channels = MemoryChannel-1 agent3.sinks = avro-sink # Define source flow agent3.sources.tail.type = exec agent3.sources.tail.command = tail -f /var/log/auth.log agent3.sources.tail.channels = MemoryChannel-1 # What kind of channel agent3.channels.MemoryChannel-1.type = memory # avro sink properties agent3.sinks.avro-sink.type = avro agent3.sinks.avro-sink.channel = MemoryChannel-1 agent3.sinks.avro-sink.hostname = hostb agent3.sinks.avro-sink.port = 41414 Here is my flume.conf in HostB # Define a memory channel called ch1 on agent1 agent1.channels.ch1.type = memory # Define an Avro source called avro-source1 on agent1 and tell it # to bind to 0.0.0.0:41414. Connect it to channel ch1. agent1.sources.avro-source1.channels = ch1 agent1.sources.avro-source1.type = avro agent1.sources.avro-source1.bind = 0.0.0.0 agent1.sources.avro-source1.port = 41414 # Define a logger sink that simply logs all events it receives # and connect it to the other end of the same channel. agent1.sinks.log-sink1.channel = ch1 agent1.sinks.log-sink1.type = logger # Finally, now that we've defined all of our components, tell # agent1 which ones we want to activate. agent1.channels = ch1 agent1.sources = avro-source1 #agent1.sources = avro-source1 agent1.sinks = sink1 agent1.sinks.sink1.type = org.apache.flume.sink.hbase.HBaseSink agent1.sinks.sink1.channel = ch1 agent1.sinks.sink1.table = flumedemo agent1.sinks.sink1.columnFamily = testing agent1.sinks.sink1.column = foo agent1.sinks.sink1.serializer org.apache.flume.sink.hbase.SimpleHbaseEventSerializer agent1.sinks.sink1.serializer.payloadColumn = col1 agent1.sinks.sink1.serializer.keyType = timestamp agent1.sinks.sink1.serializer.rowPrefix = 1 agent1.sinks.sink1.serializer.suffix = timestamp agent1.sinks.sink1.serializer.payloadColumn = pcol agent1.sinks.sink1.serializer.incrementColumn = icol
-
Re: Flume Source and Sink in different hostsHari Shreedharan 2012-10-04, 22:02
Can you send the logs also, of both agents? Does your Hbase cluster have the said column family and table with that family?
Also are you sure the files are not getting rotated out. You should use tail -F so that your code works even with files getting rotated out. Hari -- Hari Shreedharan On Thursday, October 4, 2012 at 2:53 PM, Kumar, Suresh wrote: > Hello: > > I have just downloaded and build flume-ng (apache-flume-1.3.0-SNAPSHOT). > > My goal is to collect log data from HostA (source) and send it to HostB(sink), my initial test (sending /etc/passwd) > from HostA to HostB worked fine, I was also able to load the passwd file into my HBase in HostB. > > Now, I want to load a continuous stream of log data (using tail –f), but I was not able to replicate the above process. > Flume just started fine in HostA, but I do not see any data being received by HostB or in my HBase. > > What is wrong with my configuration? > > Thanks, > Suresh > > Here is my flume.conf in HostA > > agent3.sources = tail > agent3.channels = MemoryChannel-1 > agent3.sinks = avro-sink > > # Define source flow > agent3.sources.tail.type = exec > agent3.sources.tail.command = tail -f /var/log/auth.log > agent3.sources.tail.channels = MemoryChannel-1 > > # What kind of channel > agent3.channels.MemoryChannel-1.type = memory > > # avro sink properties > agent3.sinks.avro-sink.type = avro > agent3.sinks.avro-sink.channel = MemoryChannel-1 > agent3.sinks.avro-sink.hostname = hostb > agent3.sinks.avro-sink.port = 41414 > > Here is my flume.conf in HostB > > # Define a memory channel called ch1 on agent1 > agent1.channels.ch1.type = memory > > # Define an Avro source called avro-source1 on agent1 and tell it > # to bind to 0.0.0.0:41414. Connect it to channel ch1. > agent1.sources.avro-source1.channels = ch1 > agent1.sources.avro-source1.type = avro > agent1.sources.avro-source1.bind = 0.0.0.0 > agent1.sources.avro-source1.port = 41414 > > # Define a logger sink that simply logs all events it receives > # and connect it to the other end of the same channel. > agent1.sinks.log-sink1.channel = ch1 > agent1.sinks.log-sink1.type = logger > > # Finally, now that we've defined all of our components, tell > # agent1 which ones we want to activate. > agent1.channels = ch1 > agent1.sources = avro-source1 > #agent1.sources = avro-source1 > agent1.sinks = sink1 > > agent1.sinks.sink1.type = org.apache.flume.sink.hbase.HBaseSink > agent1.sinks.sink1.channel = ch1 > agent1.sinks.sink1.table = flumedemo > agent1.sinks.sink1.columnFamily = testing > agent1.sinks.sink1.column = foo > agent1.sinks.sink1.serializer = org.apache.flume.sink.hbase.SimpleHbaseEventSerializer > agent1.sinks.sink1.serializer.payloadColumn = col1 > agent1.sinks.sink1.serializer.keyType = timestamp > agent1.sinks.sink1.serializer.rowPrefix = 1 > agent1.sinks.sink1.serializer.suffix = timestamp > agent1.sinks.sink1.serializer.payloadColumn = pcol > agent1.sinks.sink1.serializer.incrementColumn = icol > > > >
-
RE: Flume Source and Sink in different hostsKumar, Suresh 2012-10-04, 22:19
Yes, my HBase has the table and column family, if I run the /etc/passwd test using flume-ng client, the table
gets populated. Here is the log from the source agent, there is nothing much in the sink except for which seem to benign. Thanks, Suresh 2012-10-04 14:59:05,622 (lifecycleSupervisor-1-0-SendThread(localhost:2181)) [DEBUG - org.apache.zookeeper.client.ZooKeeperSaslClient.clientTunneledAuthenticationInProgress(ZooKeeperSaslClient.java:515)] Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration 2012-10-04 14:59:08,414 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:188)] Checking file:conf/flume.conf for changes source agent log: $ bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -Dflume.root.logger=DEBUG,console -n agent3 + exec /usr/lib/jvm/java-6-sun/bin/java -Xmx20m -Dflume.root.logger=DEBUG,console -cp '/opt/flume/conf:/opt/flume/lib/*' -Djava.library.path= org.apache.flume.node.Application -f conf/flume.conf -n agent3 2012-10-04 15:09:30,778 (main) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java:67)] Starting lifecycle supervisor 1 2012-10-04 15:09:30,791 (main) [INFO - org.apache.flume.node.FlumeNode.start(FlumeNode.java:54)] Flume node starting - agent3 2012-10-04 15:09:30,799 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:203)] Node manager starting 2012-10-04 15:09:30,801 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:67)] Configuration provider starting 2012-10-04 15:09:30,810 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java:67)] Starting lifecycle supervisor 10 2012-10-04 15:09:30,813 (lifecycleSupervisor-1-1) [DEBUG - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:207)] Node manager started 2012-10-04 15:09:30,819 (lifecycleSupervisor-1-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:86)] Configuration provider started 2012-10-04 15:09:30,819 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:188)] Checking file:conf/flume.conf for changes 2012-10-04 15:09:30,821 (conf-file-poller-0) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:195)] Reloading configuration file:conf/flume.conf 2012-10-04 15:09:30,839 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:912)] Added sinks: avro-sink Agent: agent3 2012-10-04 15:09:30,840 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:998)] Processing:avro-sink 2012-10-04 15:09:30,840 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1002)] Created context for avro-sink: hostname 2012-10-04 15:09:30,841 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:998)] Processing:avro-sink 2012-10-04 15:09:30,841 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:998)] Processing:avro-sink 2012-10-04 15:09:30,841 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:998)] Processing:avro-sink 2012-10-04 15:09:30,841 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:295)] Starting validation of configuration for agent: agent3, initial-configuration: AgentConfiguration[agent3] SOURCES: {tail={ parameters:{command=tail -F /var/log/auth.log, channels=MemoryChannel-1, type=exec} }} CHANNELS: {MemoryChannel-1={ parameters:{type=memory} }} SINKS: {avro-sink={ parameters:{port=41414, hostname=sig-flume, type=avro, channel=MemoryChannel-1} }} 2012-10-04 15:09:30,854 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateChannels(FlumeConfiguration.java:450)] Created channel MemoryChannel-1 2012-10-04 15:09:30,883 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSinks(FlumeConfiguration.java:655)] Creating sink: avro-sink using AVRO 2012-10-04 15:09:30,885 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:353)] Post validation configuration for agent3 AgentConfiguration created without Configuration stubs for which only basic syntactical validation was performed[agent3] SOURCES: {tail={ parameters:{command=tail -F /var/log/auth.log, channels=MemoryChannel-1, type=exec} }} CHANNELS: {MemoryChannel-1={ parameters:{type=memory} }} SINKS: {avro-sink={ parameters:{port=41414, hostname=sig-flume, type=avro, channel=MemoryChannel-1} }} 2012-10-04 15:09:30,885 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:117)] Channels:MemoryChannel-1 2012-10-04 15:09:30,885 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:118)] Sinks avro-sink 2012-10-04 15:09:30,885 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:119)] Sources tail 2012-10-04 15:09:30,885 (conf-file-poller-0) [INFO - org.apache.flume
-
Re: Flume Source and Sink in different hostsHari Shreedharan 2012-10-04, 22:40
Looks like your agent was set up properly. Can you increase the heap and try again? You can do this by setting -Xmx in the flume-env.sh file. Try setting it to 1G or higher, since you are using memory channel. Also I assume the file you are tailing is getting written to? I strongly suggest using the AsyncHBaseSink.
Thanks, Hari -- Hari Shreedharan On Thursday, October 4, 2012 at 3:19 PM, Kumar, Suresh wrote: > Yes, my HBase has the table and column family, if I run the /etc/passwd test using flume-ng client, the table > gets populated. > > Here is the log from the source agent, there is nothing much in the sink except for which seem to benign. > > Thanks, > Suresh > > 2012-10-04 14:59:05,622 (lifecycleSupervisor-1-0-SendThread(localhost:2181)) [DEBUG - org.apache.zookeeper.client.ZooKeeperSaslClient.clientTunneledAuthenticationInProgress(ZooKeeperSaslClient.java:515)] Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration > 2012-10-04 14:59:08,414 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:188)] Checking file:conf/flume.conf for changes > > source agent log: > > > $ bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -Dflume.root.logger=DEBUG,console -n agent3 > > + exec /usr/lib/jvm/java-6-sun/bin/java -Xmx20m -Dflume.root.logger=DEBUG,console -cp '/opt/flume/conf:/opt/flume/lib/*' -Djava.library.path= org.apache.flume.node.Application -f conf/flume.conf -n agent3 > 2012-10-04 15:09:30,778 (main) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java:67)] Starting lifecycle supervisor 1 > 2012-10-04 15:09:30,791 (main) [INFO - org.apache.flume.node.FlumeNode.start(FlumeNode.java:54)] Flume node starting - agent3 > 2012-10-04 15:09:30,799 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:203)] Node manager starting > 2012-10-04 15:09:30,801 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:67)] Configuration provider starting > 2012-10-04 15:09:30,810 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java:67)] Starting lifecycle supervisor 10 > 2012-10-04 15:09:30,813 (lifecycleSupervisor-1-1) [DEBUG - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:207)] Node manager started > 2012-10-04 15:09:30,819 (lifecycleSupervisor-1-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:86)] Configuration provider started > 2012-10-04 15:09:30,819 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:188)] Checking file:conf/flume.conf for changes > 2012-10-04 15:09:30,821 (conf-file-poller-0) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:195)] Reloading configuration file:conf/flume.conf > 2012-10-04 15:09:30,839 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:912)] Added sinks: avro-sink Agent: agent3 > 2012-10-04 15:09:30,840 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:998)] Processing:avro-sink > 2012-10-04 15:09:30,840 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1002)] Created context for avro-sink: hostname > 2012-10-04 15:09:30,841 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:998)] Processing:avro-sink
-
RE: Flume Source and Sink in different hostsKumar, Suresh 2012-10-04, 22:46
Hari, I just noticed some entries in HBase, so this configuration does work. I will retry with the changes you recommended. Do you think I should be using some other channel type instead of memory? Thanks, Suresh From: Hari Shreedharan [mailto:[EMAIL PROTECTED]] Sent: Thursday, October 04, 2012 3:40 PM To: [EMAIL PROTECTED] Subject: Re: Flume Source and Sink in different hosts Looks like your agent was set up properly. Can you increase the heap and try again? You can do this by setting -Xmx in the flume-env.sh file. Try setting it to 1G or higher, since you are using memory channel. Also I assume the file you are tailing is getting written to? I strongly suggest using the AsyncHBaseSink. Thanks, Hari -- Hari Shreedharan On Thursday, October 4, 2012 at 3:19 PM, Kumar, Suresh wrote: Yes, my HBase has the table and column family, if I run the /etc/passwd test using flume-ng client, the table gets populated. Here is the log from the source agent, there is nothing much in the sink except for which seem to benign. Thanks, Suresh 2012-10-04 14:59:05,622 (lifecycleSupervisor-1-0-SendThread(localhost:2181)) [DEBUG - org.apache.zookeeper.client.ZooKeeperSaslClient.clientTunneledAuthenticationInProgress(ZooKeeperSaslClient.java:515)] Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration 2012-10-04 14:59:08,414 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:188)] Checking file:conf/flume.conf <file:///\\conf\flume.conf> for changes source agent log: $ bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -Dflume.root.logger=DEBUG,console -n agent3 + exec /usr/lib/jvm/java-6-sun/bin/java -Xmx20m -Dflume.root.logger=DEBUG,console -cp '/opt/flume/conf:/opt/flume/lib/*' -Djava.library.path= org.apache.flume.node.Application -f conf/flume.conf -n agent3 2012-10-04 15:09:30,778 (main) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java:67)] Starting lifecycle supervisor 1 2012-10-04 15:09:30,791 (main) [INFO - org.apache.flume.node.FlumeNode.start(FlumeNode.java:54)] Flume node starting - agent3 2012-10-04 15:09:30,799 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:203)] Node manager starting 2012-10-04 15:09:30,801 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:67)] Configuration provider starting 2012-10-04 15:09:30,810 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java:67)] Starting lifecycle supervisor 10 2012-10-04 15:09:30,813 (lifecycleSupervisor-1-1) [DEBUG - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:207)] Node manager started 2012-10-04 15:09:30,819 (lifecycleSupervisor-1-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:86)] Configuration provider started 2012-10-04 15:09:30,819 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:188)] Checking file:conf/flume.conf <file:///\\conf\flume.conf> for changes 2012-10-04 15:09:30,821 (conf-file-poller-0) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:195)] Reloading configuration file:conf/flume.conf <file:///\\conf\flume.conf> 2012-10-04 15:09:30,839 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:912)] Added sinks: avro-sink Agent: agent3 2012-10-04 15:09:30,840 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:998)] Processing:avro-sink 2012-10-04 15:09:30,840 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1002)] Created context for avro-sink: hostname 2012-10-04 15:09:30,841 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:998)] Processing:avro-sink 2012-10-04 15:09:30,841 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:998)] Processing:avro-sink 2012-10-04 15:09:30,841 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:998)] Processing:avro-sink 2012-10-04 15:09:30,841 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:295)] Starting validation of configuration for agent: agent3, initial-configuration: AgentConfiguration[agent3] SOURCES: {tail={ parameters:{command=tail -F /var/log/auth.log, channels=MemoryChannel-1, type=exec} }} CHANNELS: {MemoryChannel-1={ parameters:{type=memory} }} SINKS: {avro-sink={ parameters:{port=41414, hostname=sig-flume, type=avro, channel=MemoryChannel-1} }} 2012-10-04 15:09:30,854 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateChannels(FlumeConfiguration.java:450)] Created channel MemoryChannel-1 2012-10-04 15:09:30,883 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSinks(FlumeConfiguration.java:655)] Creating sink: avro-sink using AVRO 2012-10-04 15:09:30,885 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:353)] Post validation configuration for agent3
-
Re: Flume Source and Sink in different hostsHari Shreedharan 2012-10-04, 23:25
It depends on what kind of guarantees you need. If you need to make sure your events are persisted even during process/system failures, you should use the File Channel, else you can use Memory Channel (performance of Memory Channel is obviously better).
Thanks, Hari -- Hari Shreedharan On Thursday, October 4, 2012 at 3:46 PM, Kumar, Suresh wrote: > > Hari, I just noticed some entries in HBase, so this configuration does work. > I will retry with the changes you recommended. Do you think I should be using > some other channel type instead of memory? > > Thanks, > Suresh > From: Hari Shreedharan [mailto:[EMAIL PROTECTED]] > Sent: Thursday, October 04, 2012 3:40 PM > To: [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED]) > Subject: Re: Flume Source and Sink in different hosts > > Looks like your agent was set up properly. Can you increase the heap and try again? You can do this by setting -Xmx in the flume-env.sh (http://flume-env.sh) file. Try setting it to 1G or higher, since you are using memory channel. Also I assume the file you are tailing is getting written to? I strongly suggest using the AsyncHBaseSink. > > > > > > Thanks, > > Hari > > > > -- > > Hari Shreedharan > > > > > On Thursday, October 4, 2012 at 3:19 PM, Kumar, Suresh wrote: > > > > Yes, my HBase has the table and column family, if I run the /etc/passwd test using flume-ng client, the table > > > > > > gets populated. > > > > > > > > > > > > Here is the log from the source agent, there is nothing much in the sink except for which seem to benign. > > > > > > > > Thanks, > > > > > > Suresh > > > > > > > > > > > > 2012-10-04 14:59:05,622 (lifecycleSupervisor-1-0-SendThread(localhost:2181)) [DEBUG - org.apache.zookeeper.client.ZooKeeperSaslClient.clientTunneledAuthenticationInProgress(ZooKeeperSaslClient.java:515)] Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration > > > > > > 2012-10-04 14:59:08,414 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:188)] Checking file:conf/flume.conf (file:///\\conf\flume.conf) for changes > > > > > > > > > > > > source agent log: > > > > > > > > > > > > > > > > > > $ bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -Dflume.root.logger=DEBUG,console -n agent3 > > > > > > > > > > > > + exec /usr/lib/jvm/java-6-sun/bin/java -Xmx20m -Dflume.root.logger=DEBUG,console -cp '/opt/flume/conf:/opt/flume/lib/*' -Djava.library.path= org.apache.flume.node.Application -f conf/flume.conf -n agent3 > > > > > > 2012-10-04 15:09:30,778 (main) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java:67)] Starting lifecycle supervisor 1 > > > > > > 2012-10-04 15:09:30,791 (main) [INFO - org.apache.flume.node.FlumeNode.start(FlumeNode.java:54)] Flume node starting - agent3 > > > > > > 2012-10-04 15:09:30,799 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:203)] Node manager starting > > > > > > 2012-10-04 15:09:30,801 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:67)] Configuration provider starting > > > > > > 2012-10-04 15:09:30,810 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java:67)] Starting lifecycle supervisor 10 > > > > > > 2012-10-04 15:09:30,813 (lifecycleSupervisor-1-1) [DEBUG - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:207)] Node manager started > > > > > > 2012-10-04 15:09:30,819 (lifecycleSupervisor-1-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:86)] Configuration provider started
-
RE: Flume Source and Sink in different hostsKumar, Suresh 2012-10-05, 18:07
I increased the heap size in source and sink to 1G, I now use the AsyncHBaseSink in my sink agent configuration, it didn’t make
that much of a difference. I changed my source agent configuration from memory to file in HostA, I did not change my sink agent configuration in HostB (it is still set to Memory Channel). I still see the latency issue (BTW, the auth.log grows every second). However I noticed that if I kill the agent in HostA (source) and restart, I see entries in HBase. Am I missing something? How often does the data get flushed from source to sink? Should sink also be the same channel type (file)? Here is my conf and log for HostA (source) flume.conf (source) agent3.sources = tail agent3.channels = FileChannel-1 agent3.sinks = avro-sink # Define source flow agent3.sources.tail.type = exec agent3.sources.tail.command = tail -F /var/log/auth.log agent3.sources.tail.channels = FileChannel-1 # What kind of channel agent3.channels.FileChannel-1.type = file agent3.channels.FileChannel-1.checkpointDir = /tmp/checkpoint agent3.channels.FileChannel-1.dataDirs = /tmp/data # avro sink properties agent3.sinks.avro-sink.type = avro agent3.sinks.avro-sink.channel = FileChannel-1 agent3.sinks.avro-sink.hostname = sig-flume agent3.sinks.avro-sink.port = 41414 Log (source) 2012-10-05 10:49:03,736 (main) [INFO - org.apache.flume.node.FlumeNode.start(FlumeNode.java:54)] Flume node starting - agent3 2012-10-05 10:49:03,752 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:203)] Node manager starting 2012-10-05 10:49:03,752 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:67)] Configuration provider starting 2012-10-05 10:49:03,760 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java:67)] Starting lifecycle supervisor 12 2012-10-05 10:49:03,763 (lifecycleSupervisor-1-1) [DEBUG - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:207)] Node manager started 2012-10-05 10:49:03,767 (lifecycleSupervisor-1-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:86)] Configuration provider started 2012-10-05 10:49:03,769 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:188)] Checking file:conf/flume.conf for changes 2012-10-05 10:49:03,772 (conf-file-poller-0) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:195)] Reloading configuration file:conf/flume.conf 2012-10-05 10:49:03,801 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:912)] Added sinks: avro-sink Agent: agent3 2012-10-05 10:49:03,802 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:998)] Processing:avro-sink 2012-10-05 10:49:03,803 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1002)] Created context for avro-sink: hostname 2012-10-05 10:49:03,803 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:998)] Processing:avro-sink 2012-10-05 10:49:03,803 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:998)] Processing:avro-sink 2012-10-05 10:49:03,804 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:998)] Processing:avro-sink 2012-10-05 10:49:03,805 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:295)] Starting validation of configuration for agent: agent3, initial-configuration: AgentConfiguration[agent3] SOURCES: {tail={ parameters:{command=tail -F /var/log/auth.log, channels=FileChannel-1, type=exec} }} CHANNELS: {FileChannel-1={ parameters:{checkpointDir=/tmp/checkpoint, dataDirs=/tmp/data, type=file} }} SINKS: {avro-sink={ parameters:{port=41414, hostname=sig-flume, type=avro, channel=FileChannel-1} }} 2012-10-05 10:49:03,823 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateChannels(FlumeConfiguration.java:450)] Created channel FileChannel-1 2012-10-05 10:49:03,850 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSinks(FlumeConfiguration.java:655)] Creating sink: avro-sink using AVRO 2012-10-05 10:49:03,860 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:353)] Post validation configuration for agent3 AgentConfiguration created without Configuration stubs for which only basic syntactical validation was performed[agent3] SOURCES: {tail={ parameters:{command=tail -F /var/log/auth.log, channels=FileChannel-1, type=exec} }} CHANNELS: {FileChannel-1={ parameters:{checkpointDir=/tmp/checkpoint, dataDirs=/tmp/data, type=file} }} SINKS: {avro-sink={ parameters:{port=41414, hostname=sig-flume, type=avro, channel=FileChannel-1} }} 2012-10-05 10:49:03,860 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:117)] Channels:FileChannel-1 2012-10-05 10:49:03,861 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:118)] Sinks avro-sink 2012-10-05 10:49:03,863 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.va
-
RE: Flume Source and Sink in different hostsKumar, Suresh 2012-10-05, 18:27
Just a quick update, it is definitely a source issue and nothing to do with flume configuration in the sink.
I restarted the sink, I do not see the data in HBase, however if I stop the agent in source, I do not see any data, but as soon as I start the agent in source, I see the data in my HBase which is in HostB. Thanks for any help, Suresh From: Kumar, Suresh [mailto:[EMAIL PROTECTED]] Sent: Friday, October 05, 2012 11:08 AM To: [EMAIL PROTECTED] Subject: RE: Flume Source and Sink in different hosts I increased the heap size in source and sink to 1G, I now use the AsyncHBaseSink in my sink agent configuration, it didn’t make that much of a difference. I changed my source agent configuration from memory to file in HostA, I did not change my sink agent configuration in HostB (it is still set to Memory Channel). I still see the latency issue (BTW, the auth.log grows every second). However I noticed that if I kill the agent in HostA (source) and restart, I see entries in HBase. Am I missing something? How often does the data get flushed from source to sink? Should sink also be the same channel type (file)? Here is my conf and log for HostA (source) flume.conf (source) agent3.sources = tail agent3.channels = FileChannel-1 agent3.sinks = avro-sink # Define source flow agent3.sources.tail.type = exec agent3.sources.tail.command = tail -F /var/log/auth.log agent3.sources.tail.channels = FileChannel-1 # What kind of channel agent3.channels.FileChannel-1.type = file agent3.channels.FileChannel-1.checkpointDir = /tmp/checkpoint agent3.channels.FileChannel-1.dataDirs = /tmp/data # avro sink properties agent3.sinks.avro-sink.type = avro agent3.sinks.avro-sink.channel = FileChannel-1 agent3.sinks.avro-sink.hostname = sig-flume agent3.sinks.avro-sink.port = 41414 Log (source) 2012-10-05 10:49:03,736 (main) [INFO - org.apache.flume.node.FlumeNode.start(FlumeNode.java:54)] Flume node starting - agent3 2012-10-05 10:49:03,752 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:203)] Node manager starting 2012-10-05 10:49:03,752 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:67)] Configuration provider starting 2012-10-05 10:49:03,760 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java:67)] Starting lifecycle supervisor 12 2012-10-05 10:49:03,763 (lifecycleSupervisor-1-1) [DEBUG - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:207)] Node manager started 2012-10-05 10:49:03,767 (lifecycleSupervisor-1-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:86)] Configuration provider started 2012-10-05 10:49:03,769 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:188)] Checking file:conf/flume.conf for changes 2012-10-05 10:49:03,772 (conf-file-poller-0) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:195)] Reloading configuration file:conf/flume.conf 2012-10-05 10:49:03,801 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:912)] Added sinks: avro-sink Agent: agent3 2012-10-05 10:49:03,802 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:998)] Processing:avro-sink 2012-10-05 10:49:03,803 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1002)] Created context for avro-sink: hostname 2012-10-05 10:49:03,803 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:998)] Processing:avro-sink 2012-10-05 10:49:03,803 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:998)] Processing:avro-sink 2012-10-05 10:49:03,804 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:998)] Processing:avro-sink 2012-10-05 10:49:03,805 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:295)] Starting validation of configuration for agent: agent3, initial-configuration: AgentConfiguration[agent3] SOURCES: {tail={ parameters:{command=tail -F /var/log/auth.log, channels=FileChannel-1, type=exec} }} CHANNELS: {FileChannel-1={ parameters:{checkpointDir=/tmp/checkpoint, dataDirs=/tmp/data, type=file} }} SINKS: {avro-sink={ parameters:{port=41414, hostname=sig-flume, type=avro, channel=FileChannel-1} }} 2012-10-05 10:49:03,823 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateChannels(FlumeConfiguration.java:450)] Created channel FileChannel-1 2012-10-05 10:49:03,850 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSinks(FlumeConfiguration.java:655)] Creating sink: avro-sink using AVRO 2012-10-05 10:49:03,860 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:353)] Post validation configuration for agent3 AgentConfiguration created without Configuration stubs for which only basic syntactical validation was performed[agent3] SOURCES: {tail={ parameters:{command=tail -F /var/log/auth.log, channels=FileChannel-1, type=exec} }} CHANNELS: {FileChannel-1={ parameters:{checkpointDir=/tmp/checkpoint, dataDirs=/tmp/data, type=file} }} SINKS:
-
Re: Flume Source and Sink in different hostsHari Shreedharan 2012-10-05, 18:40
Ah, it seems like this is because your file is growing not "too fast." The exec source does do some "batching" by waiting for around 20 lines to come in before writing it out to the channel. This is important to not hit performance of channels like File Channel. Can you add this to your source config:
batchSize = 1 If you set batch size to 1, I would not recommend using File Channel - because there will be far too many IO ops to give good performance. You should use Memory Channel - of course, the data will not survive a program or system crash. If you want to use File Channel, I'd suggest with batchSize of 100 or so. Thanks, Hari -- Hari Shreedharan On Friday, October 5, 2012 at 11:27 AM, Kumar, Suresh wrote: > Just a quick update, it is definitely a source issue and nothing to do with flume configuration in the sink. > > I restarted the sink, I do not see the data in HBase, however if I stop the agent in source, I do not see > any data, but as soon as I start the agent in source, I see the data in my HBase which is in HostB. > > Thanks for any help, > Suresh > > > From: Kumar, Suresh [mailto:[EMAIL PROTECTED]] > Sent: Friday, October 05, 2012 11:08 AM > To: [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED]) > Subject: RE: Flume Source and Sink in different hosts > > I increased the heap size in source and sink to 1G, I now use the AsyncHBaseSink in my sink agent configuration, it didn’t make > that much of a difference. > > I changed my source agent configuration from memory to file in HostA, I did not change my sink agent configuration in HostB > (it is still set to Memory Channel). I still see the latency issue (BTW, the auth.log grows every second). However I noticed > that if I kill the agent in HostA (source) and restart, I see entries in HBase. Am I missing something? How often does the data > get flushed from source to sink? Should sink also be the same channel type (file)? > > Here is my conf and log for HostA (source) > > flume.conf (source) > > agent3.sources = tail > agent3.channels = FileChannel-1 > agent3.sinks = avro-sink > > # Define source flow > agent3.sources.tail.type = exec > agent3.sources.tail.command = tail -F /var/log/auth.log > agent3.sources.tail.channels = FileChannel-1 > > # What kind of channel > agent3.channels.FileChannel-1.type = file > agent3.channels.FileChannel-1.checkpointDir = /tmp/checkpoint > agent3.channels.FileChannel-1.dataDirs = /tmp/data > > # avro sink properties > agent3.sinks.avro-sink.type = avro > agent3.sinks.avro-sink.channel = FileChannel-1 > agent3.sinks.avro-sink.hostname = sig-flume > agent3.sinks.avro-sink.port = 41414 > > > Log (source) > > > 2012-10-05 10:49:03,736 (main) [INFO - org.apache.flume.node.FlumeNode.start(FlumeNode.java:54)] Flume node starting - agent3 > 2012-10-05 10:49:03,752 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:203)] Node manager starting > 2012-10-05 10:49:03,752 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:67)] Configuration provider starting > 2012-10-05 10:49:03,760 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java:67)] Starting lifecycle supervisor 12 > 2012-10-05 10:49:03,763 (lifecycleSupervisor-1-1) [DEBUG - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:207)] Node manager started > 2012-10-05 10:49:03,767 (lifecycleSupervisor-1-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:86)] Configuration provider started > 2012-10-05 10:49:03,769 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:188)] Checking file:conf/flume.conf for changes
-
RE: Flume Source and Sink in different hostsKumar, Suresh 2012-10-05, 21:55
Yes, that was it, I moved the sink to a different server where auth.log had more traffic, I see the my data being pushed to HBase. Thanks for your help. Suresh From: Hari Shreedharan [mailto:[EMAIL PROTECTED]] Sent: Friday, October 05, 2012 11:40 AM To: [EMAIL PROTECTED] Subject: Re: Flume Source and Sink in different hosts Ah, it seems like this is because your file is growing not "too fast." The exec source does do some "batching" by waiting for around 20 lines to come in before writing it out to the channel. This is important to not hit performance of channels like File Channel. Can you add this to your source config: batchSize = 1 If you set batch size to 1, I would not recommend using File Channel - because there will be far too many IO ops to give good performance. You should use Memory Channel - of course, the data will not survive a program or system crash. If you want to use File Channel, I'd suggest with batchSize of 100 or so. Thanks, Hari -- Hari Shreedharan On Friday, October 5, 2012 at 11:27 AM, Kumar, Suresh wrote: Just a quick update, it is definitely a source issue and nothing to do with flume configuration in the sink. I restarted the sink, I do not see the data in HBase, however if I stop the agent in source, I do not see any data, but as soon as I start the agent in source, I see the data in my HBase which is in HostB. Thanks for any help, Suresh From: Kumar, Suresh [mailto:[EMAIL PROTECTED]] Sent: Friday, October 05, 2012 11:08 AM To: [EMAIL PROTECTED] Subject: RE: Flume Source and Sink in different hosts I increased the heap size in source and sink to 1G, I now use the AsyncHBaseSink in my sink agent configuration, it didn’t make that much of a difference. I changed my source agent configuration from memory to file in HostA, I did not change my sink agent configuration in HostB (it is still set to Memory Channel). I still see the latency issue (BTW, the auth.log grows every second). However I noticed that if I kill the agent in HostA (source) and restart, I see entries in HBase. Am I missing something? How often does the data get flushed from source to sink? Should sink also be the same channel type (file)? Here is my conf and log for HostA (source) flume.conf (source) agent3.sources = tail agent3.channels = FileChannel-1 agent3.sinks = avro-sink # Define source flow agent3.sources.tail.type = exec agent3.sources.tail.command = tail -F /var/log/auth.log agent3.sources.tail.channels = FileChannel-1 # What kind of channel agent3.channels.FileChannel-1.type = file agent3.channels.FileChannel-1.checkpointDir = /tmp/checkpoint agent3.channels.FileChannel-1.dataDirs = /tmp/data # avro sink properties agent3.sinks.avro-sink.type = avro agent3.sinks.avro-sink.channel = FileChannel-1 agent3.sinks.avro-sink.hostname = sig-flume agent3.sinks.avro-sink.port = 41414 Log (source) 2012-10-05 10:49:03,736 (main) [INFO - org.apache.flume.node.FlumeNode.start(FlumeNode.java:54)] Flume node starting - agent3 2012-10-05 10:49:03,752 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:203)] Node manager starting 2012-10-05 10:49:03,752 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:67)] Configuration provider starting 2012-10-05 10:49:03,760 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java:67)] Starting lifecycle supervisor 12 2012-10-05 10:49:03,763 (lifecycleSupervisor-1-1) [DEBUG - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:207)] Node manager started 2012-10-05 10:49:03,767 (lifecycleSupervisor-1-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:86)] Configuration provider started 2012-10-05 10:49:03,769 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:188)] Checking file:conf/flume.conf <file:///\\conf\flume.conf> for changes 2012-10-05 10:49:03,772 (conf-file-poller-0) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:195)] Reloading configuration file:conf/flume.conf <file:///\\conf\flume.conf> 2012-10-05 10:49:03,801 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:912)] Added sinks: avro-sink Agent: agent3 2012-10-05 10:49:03,802 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:998)] Processing:avro-sink 2012-10-05 10:49:03,803 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1002)] Created context for avro-sink: hostname 2012-10-05 10:49:03,803 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:998)] Processing:avro-sink 2012-10-05 10:49:03,803 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:998)] Processing:avro-sink 2012-10-05 10:49:03,804 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:998)] Processing:avro-sink 2012-10-05 10:49:03,805 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:295)] Starting validation of configuration for agent: agent3, initial-configuration: AgentConfiguration[agent3] SOURCES: {ta |