Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Flume and HDFS integration


Copy link to this message
-
Re: Flume and HDFS integration
Hello Brock,
first of all thank you for answering my questions. I appreciate it since I am a real newbie in Flume / Hadoop , etc...

But now I am confused. According to you statement, the filetype is the key here. Now just take a look on my flume.conf below: The filetype was from set to "DataStream".
Now which is the right one now: SequenceFile, DataStream or CompressedStream?
agent1.channels = MemoryChannel-2
agent1.channels.MemoryChannel-2.type = memory

agent1.sources = tail
agent1.sources.tail.channels = MemoryChannel-2
agent1.sources.tail.type = exec
agent1.sources.tail.command = tail -F /opt/apache2/logs/access_log

agent1.sinks = HDFS
agent1.sinks.HDFS.channel = MemoryChannel-2
agent1.sinks.HDFS.type = hdfs
agent1.sinks.HDFS.hdfs.file.Type = DataStream
agent1.sinks.HDFS.hdfs.path = hdfs://localhost:9000
#agent1.sinks.HDFS.hdfs.path = /mnt/hdfs/data
agent1.sinks.HDFS.hdfs.writeFormat = Text
Many Thanks,
Emile

-------- Original-Nachricht --------
> Datum: Thu, 29 Nov 2012 19:26:37 -0600
> Von: Brock Noland <[EMAIL PROTECTED]>
> An: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Betreff: Re: Flume and HDFS integration

> HI,
>
> On Thu, Nov 29, 2012 at 7:17 PM, Roman Shaposhnik <[EMAIL PROTECTED]> wrote:
> > On Thu, Nov 29, 2012 at 9:18 AM, Brock Noland <[EMAIL PROTECTED]>
> wrote:
> >> 1) It's a sequence file, you can change it a text file if you want. See
> >> FileType here http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
> >
> > Don't you also have to change a serialization format to get rid of the
> binary
> > structure completely? IOW, you'd have to add something like:
> >     agent.sinks.hdfsSink.hdfs.serializer > > org.apache.flume.serialization.BodyTextEventSerializer
>
> BodyTextEventSerializer is the default serializer. Serializers decide
> how to turn Events into records while fileType decides what type of
> file the event is written to.
>
> Brock
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB