Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Can HDFSSink write headers as well?


Copy link to this message
-
Re: Can HDFSSink write headers as well?
On Tue, Aug 21, 2012 at 2:25 AM, バーチャル クリストファー
<[EMAIL PROTECTED]>wrote:

> Hi David,
>
> Currently there is no way to write headers to HDFS using the built-in
> Flume functionality.
>

This is not entirely true, the following combination will write headers to
HDFS, in an avro_data file format (binary).

agent.sinks.hdfsBinarySink.hdfs.fileType = DataStream
agent.sinks.hdfsBinarySink.serializer =  avro_client
agent.sinks.hdfsBinarySink.hdfs.writeFormat =  writable

The serializer used is part of flume distribution viz.
flume-ng-core/src/main/java/org/apache/flume/serialization/FlumeEventAvroEventSerializer.java

A file thus written can be processed with AVRO mapreduce API found in AVRO
distribution.

Also note that simply using DataStream doesn't mean it's a text file, the
serializer and hdfs.writeFormat also decide
whether the file is text or binary.

I've read the entire HDFS sink code and exprimented with it a lot, so if
you want more details, let me know.

>
> If you are writing to text or binary files on HDFS (i.e. you have set
> hdfs.fileType = DataStream or CompressedStream in your config), then you
> can supply your own custom serializer, which will allow you to write
> headers to HDFS. You will need to write a serializer that implements
> org.apache.flume.**serialization.EventSerializer.
>
> If, on the other hand, you are writing to HDFS SequenceFiles, then
> unfortunately there is no way to customize the way that events are
> serialized, so you cannot write event headers to HDFS. This is a known
> issue (FLUME-1100) and I have supplied a patch to fix it.
>
> Chris.
>
>
>
> On 2012/08/21 11:36, David Capwell wrote:
>
>> I was wondering if I pass random data to an event's header, can the
>> HDFSSink write it to HDFS?  I know it can use the headers to split the data
>> into different paths, but what about writing the data to HDFS itself?
>>
>> thanks for your time reading this email.
>>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB