Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Flume Source and Sink in different hosts


Copy link to this message
-
Re: Flume Source and Sink in different hosts
Can you send the logs also, of both agents? Does your Hbase cluster have the said column family and table with that family?  

Also are you sure the files are not getting rotated out. You should use tail -F so that your code works even with files getting rotated out.
Hari
--  
Hari Shreedharan
On Thursday, October 4, 2012 at 2:53 PM, Kumar, Suresh wrote:

> Hello:
>  
> I have just downloaded and build flume-ng (apache-flume-1.3.0-SNAPSHOT).
>  
> My goal is to collect log data from HostA (source) and send it to HostB(sink), my initial test (sending /etc/passwd)  
> from HostA to HostB worked fine, I was also able to load the passwd file into my HBase in HostB.
>  
> Now, I want to load a continuous stream of log data (using tail –f), but I was not able to replicate the above process.
> Flume just started fine in HostA, but I do not see any data being received by HostB or in my HBase.
>  
> What is wrong with my configuration?
>  
> Thanks,
> Suresh
>  
> Here is my flume.conf in HostA
>  
> agent3.sources = tail
> agent3.channels = MemoryChannel-1
> agent3.sinks = avro-sink
>  
> # Define source flow
> agent3.sources.tail.type = exec
> agent3.sources.tail.command = tail -f /var/log/auth.log
> agent3.sources.tail.channels = MemoryChannel-1
>  
> # What kind of channel
> agent3.channels.MemoryChannel-1.type = memory
>  
> # avro sink properties
> agent3.sinks.avro-sink.type = avro
> agent3.sinks.avro-sink.channel = MemoryChannel-1
> agent3.sinks.avro-sink.hostname = hostb
> agent3.sinks.avro-sink.port = 41414
>  
> Here is my flume.conf in HostB
>  
> # Define a memory channel called ch1 on agent1
> agent1.channels.ch1.type = memory
>  
> # Define an Avro source called avro-source1 on agent1 and tell it
> # to bind to 0.0.0.0:41414. Connect it to channel ch1.
> agent1.sources.avro-source1.channels = ch1
> agent1.sources.avro-source1.type = avro
> agent1.sources.avro-source1.bind = 0.0.0.0
> agent1.sources.avro-source1.port = 41414
>  
> # Define a logger sink that simply logs all events it receives
> # and connect it to the other end of the same channel.
> agent1.sinks.log-sink1.channel = ch1
> agent1.sinks.log-sink1.type = logger
>  
> # Finally, now that we've defined all of our components, tell
> # agent1 which ones we want to activate.
> agent1.channels = ch1
> agent1.sources = avro-source1
> #agent1.sources = avro-source1
> agent1.sinks = sink1
>  
> agent1.sinks.sink1.type = org.apache.flume.sink.hbase.HBaseSink
> agent1.sinks.sink1.channel = ch1
> agent1.sinks.sink1.table = flumedemo
> agent1.sinks.sink1.columnFamily = testing
> agent1.sinks.sink1.column = foo
> agent1.sinks.sink1.serializer = org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
> agent1.sinks.sink1.serializer.payloadColumn = col1
> agent1.sinks.sink1.serializer.keyType = timestamp
> agent1.sinks.sink1.serializer.rowPrefix = 1
> agent1.sinks.sink1.serializer.suffix = timestamp
> agent1.sinks.sink1.serializer.payloadColumn = pcol
> agent1.sinks.sink1.serializer.incrementColumn = icol
>  
>  
>  
>  
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB