Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> FW: Flume - HTTPSource & HDFSSink


Copy link to this message
-
Re: Flume - HTTPSource & HDFSSink
Nikhil,  

Flume's HDFS Sink will by default write to HDFS as Sequence Files. If you want it to write as text or avro, you must use DataStream. Please see the Flume User Guide.

Thanks,
Hari

--  
Hari Shreedharan
On Tuesday, March 26, 2013 at 11:01 PM, Nikhil Shirke wrote:

> Hello,
>  
> I have following configuration:
>  
> agent.sources = httpSrc
> agent.channels = memoryChannel
> agent.sinks = hdfsSink  
>  
> # For each one of the sources, the type is defined
> agent.sources.httpSrc.type = org.apache.flume.source.http.HTTPSource
> agent.sources.httpSrc.port = 9000
> agent.sources.httpSrc.handler = org.apache.flume.source.http.JSONHandler  
>  
> # The channel can be defined as follows.
> agent.sources.httpSrc.channels = memoryChannel  
>  
> # Each sink's type must be defined
> agent.sinks.hdfsSink.type = hdfs
> agent.sinks.hdfsSink.hdfs.path = hdfs://10.187.142.39/flume/
> agent.sinks.hdfsSink.fileType = DataStream
> agent.sinks.hdfsSink.writeFormat = Text
> agent.sinks.hdfsSink.serializer = Text  
>  
> #Specify the channel the sink should use
> agent.sinks.hdfsSink.channel = memoryChannel
> agent.sinks.logSink.channel = memoryChannel  
>  
> # Each channel's type is defined.
> agent.channels.memoryChannel.type = memory  
> # Other config values specific to each type of channel(sink or source)
> # can be defined as well
> # In this case, it specifies the capacity of the memory channel
> agent.channels.memoryChannel.capacity = 1000
> agent.channels.memoryChannel.transactionCapacity = 100  
>  
> When execute following command, it generates a file in /flume folder.  
> curl -X POST -d '[{ "headers" : { "timestamp" : "434324343", "host" : "random_host.example.com (http://random_host.example.com)" }, "body" : "random_body" }, { "headers" : { "namenode" : "namenode.example.com (http://namenode.example.com)", "datanode" : "random_datanode.example.com (http://random_datanode.example.com)" }, "body" : "really_random_body" }]' 10.187.142.125:9000
>  
> However file contents are as follows and its in binary format.
> SEQ!org.apache.hadoop.io.LongWritable"org.apache.hadoop.io.BytesWritableÃç£idYvQS¸/\Á=ãCw
>                                                                                          random_bod=«9really_random_body  
>  
> Thanks,
> Nikhil Shirke
>  
>  
>  
>  
>  
> This message contains information that may be privileged or confidential and is the property of the KPIT Cummins Infosystems Ltd. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. KPIT Cummins Infosystems Ltd. does not accept any liability for virus infected mails.