-Re: Can HDFSSink write headers as well?
Bhaskar V. Karambelkar 2012-08-21, 15:22
On Tue, Aug 21, 2012 at 2:25 AM, バーチャル クリストファー
> Hi David,
> Currently there is no way to write headers to HDFS using the built-in
> Flume functionality.
This is not entirely true, the following combination will write headers to
HDFS, in an avro_data file format (binary).
agent.sinks.hdfsBinarySink.hdfs.fileType = DataStream
agent.sinks.hdfsBinarySink.serializer = avro_client
agent.sinks.hdfsBinarySink.hdfs.writeFormat = writable
The serializer used is part of flume distribution viz.
A file thus written can be processed with AVRO mapreduce API found in AVRO
Also note that simply using DataStream doesn't mean it's a text file, the
serializer and hdfs.writeFormat also decide
whether the file is text or binary.
I've read the entire HDFS sink code and exprimented with it a lot, so if
you want more details, let me know.
> If you are writing to text or binary files on HDFS (i.e. you have set
> hdfs.fileType = DataStream or CompressedStream in your config), then you
> can supply your own custom serializer, which will allow you to write
> headers to HDFS. You will need to write a serializer that implements
> If, on the other hand, you are writing to HDFS SequenceFiles, then
> unfortunately there is no way to customize the way that events are
> serialized, so you cannot write event headers to HDFS. This is a known
> issue (FLUME-1100) and I have supplied a patch to fix it.
> On 2012/08/21 11:36, David Capwell wrote:
>> I was wondering if I pass random data to an event's header, can the
>> HDFSSink write it to HDFS? I know it can use the headers to split the data
>> into different paths, but what about writing the data to HDFS itself?
>> thanks for your time reading this email.