Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> HDFS Sink writeformat / filetype / serializer


Copy link to this message
-
Re: HDFS Sink writeformat / filetype / serializer
Could you report behaviour that you consider as invalid to Flume JIRA (1)?

Also please do not hesitate to submit patches for user guide describing your findings so that others with the same questions do not have to go through the same exercise.

Jarcec
 
1: https://issues.apache.org/jira/browse/FLUME

On Tue, Jul 31, 2012 at 11:13:42AM -0400, Gumnaam Sur wrote:
> To add to the question,
>
> I've setup 4 HDFS sinks as follows
>
> a) seqaeSink ,  serializer = avro_event , fileType = SequenceFile
> b) seqtSink ,  serializer = text , fileType = SequenceFile
> c) dsaeSink ,  serializer = avro_event , fileType = DataStream
> c) dsaeSink ,  serializer = text , fileType = DataStream , writable = text
>
> The problem is seqae, doesn't write AvroEvent object, rather it writes a
> Sequence File of
> LongWritable,BytesWritable, and this is WRONG. The Sequence File should be
> of AvroEvent.
>
> The seqt sink works correctly, as in it writes a sequence File of
> LongWritable, BytesWritable.
>
> dsae sink, writes a Data Stream File (each event saperated by new line) of
> Avro Events
>
> dst sink writes plane message body to the file, and that's correct too.
>
> So in conclusion the combination
>  serializer = avro_event , fileType = SequenceFile
> is not working as expected, it works just like the combination   serializer
> = text , fileType = SequenceFile
>
>
> On Tue, Jul 31, 2012 at 10:11 AM, Gumnaam Sur <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> > For HDFS Sink we have 3 properties which determine the type and content
> > that gets written to the file.
> >
> > writeFomrat = text | writabe
> > fileType = SequenceFile | DataStream | CompressedStream
> > serializer = text | avro_event | <custom>
> >
> > Can one of the devs, explain these in detail, and the output expected by
> > various permutation / combinations of the 3 values. and if any combination
> > is
> > invalid etc.
> >
> > e.g. what's the difference between the combo
> > serializer = avro_event , fileType = SequenceFile
> > and
> > serializer = avro_event , fileType = DataStream
> >
> > , What's the difference between writeFormat = 'text' and writeFormat > > 'writable' ?
> >
> > To give some background, I am looking to serialize Avro Events, in HDFS in
> > Sequence file,
> > and trying to use org.apache.avro.mapreduce.* from my hadoop jobs. I
> > figure using SequenceFile
> > should give better performance, over text, but I am not exactly sure of
> > the various flume options
> > I mentioned above.
> >
> > thanks
> >