Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> FW: Flume - HTTPSource & HDFSSink


Copy link to this message
-
FW: Flume - HTTPSource & HDFSSink
Hello,

I have following configuration:

agent.sources = httpSrc
agent.channels = memoryChannel
agent.sinks = hdfsSink

# For each one of the sources, the type is defined
agent.sources.httpSrc.type = org.apache.flume.source.http.HTTPSource
agent.sources.httpSrc.port = 9000
agent.sources.httpSrc.handler = org.apache.flume.source.http.JSONHandler

# The channel can be defined as follows.
agent.sources.httpSrc.channels = memoryChannel

# Each sink's type must be defined
agent.sinks.hdfsSink.type = hdfs
agent.sinks.hdfsSink.hdfs.path = hdfs://10.187.142.39/flume/
agent.sinks.hdfsSink.fileType = DataStream
agent.sinks.hdfsSink.writeFormat = Text
agent.sinks.hdfsSink.serializer = Text

#Specify the channel the sink should use
agent.sinks.hdfsSink.channel = memoryChannel
agent.sinks.logSink.channel = memoryChannel

# Each channel's type is defined.
agent.channels.memoryChannel.type = memory

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent.channels.memoryChannel.capacity = 1000
agent.channels.memoryChannel.transactionCapacity = 100

When execute following command, it generates a file in /flume folder.

curl -X POST -d '[{ "headers" : { "timestamp" : "434324343", "host" : "random_host.example.com" }, "body" : "random_body" }, { "headers" : { "namenode" : "namenode.example.com", "datanode" : "random_datanode.example.com" }, "body" : "really_random_body" }]' 10.187.142.125:9000

However file contents are as follows and its in binary format.

SEQ!org.apache.hadoop.io.LongWritable"org.apache.hadoop.io.BytesWritableÃç£idYvQS¸/\Á=ãCw
                                                                                         random_bod=«9really_random_body

Thanks,

Nikhil Shirke

+
Hari Shreedharan 2013-03-27, 18:30
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB