Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> HDFS Event Sink problems


+
Harish Mandala 2012-09-24, 22:01
+
Harish Mandala 2012-09-25, 19:17
Copy link to this message
-
Re: HDFS Event Sink problems
Harish,
What did you find on your side? Could it be related to
https://issues.apache.org/jira/browse/FLUME-1610 ? I am looking at that
issue right now.

Regards,
Mike

On Tue, Sep 25, 2012 at 12:17 PM, Harish Mandala <[EMAIL PROTECTED]>wrote:

> Thanks, but I understood why this is happening.
>
> On Mon, Sep 24, 2012 at 6:01 PM, Harish Mandala <[EMAIL PROTECTED]>wrote:
>
>> Hello,
>>
>>
>> I’m having some trouble with the HDFS Event Sink. I’m using the latest
>> version of flume NG, checked out today.
>>
>>
>> I am using curloader to hit “MycustomSource”, which essentially takes in
>> HTTP messages, and splits the content into 2 “kinds” of flume events
>> (differentiated by header key-value). The first kind is sent to hdfs-sink1,
>> and the second kind to hdfs-sink2 by a multiplexing selector as outlined in
>> the configuration below. There’s also an hdfs-sink3 which can be ignored at
>> present.
>>
>> I can’t really understand what’s going on. It seems related to some of
>> the race condition issues outlined here:
>>
>> https://issues.apache.org/jira/browse/FLUME-1219
>>
>>
>> Please let me know if you need more information.
>>
>>
>> The following is my conf file. It is followed by flume.log.
>>
>>
>> #### flume.conf ####
>>
>> agent1.channels = ch1 ch2 ch3
>>
>> agent1.sources = mycustom-source1
>>
>> agent1.sinks = hdfs-sink1 hdfs-sink2 hdfs-sink3
>>
>> # Define a memory channel called ch1 on agent1
>>
>> agent1.channels.ch1.type = memory
>>
>> agent1.channels.ch1.capacity = 200000
>>
>> agent1.channels.ch1.transactionCapacity = 20000
>>
>> agent1.channels.ch2.type = memory
>>
>> agent1.channels.ch2.capacity = 1000000
>>
>> agent1.channels.ch2.transactionCapacity = 100000
>>
>> agent1.channels.ch3.type = memory
>>
>> agent1.channels.ch3.capacity = 10000
>>
>> agent1.channels.ch3.transactionCapacity = 5000
>>
>>
>>
>> #agent1.channels.ch2.type = memory
>>
>> #agent1.channels.ch3.type = memory
>>
>>
>>
>> # Define an Mycustom custom source called mycustom-source1 on agent1 and
>> tell it
>>
>> # to bind to 0.0.0.0:41414. Connect it to channel ch1.
>>
>> agent1.sources.mycustom-source1.channels = ch1 ch2 ch3
>>
>> agent1.sources.mycustom-source1.type >> org.apache.flume.source.MycustomSource
>>
>> agent1.sources.mycustom-source1.bind = 127.0.0.1
>>
>> agent1.sources.mycustom-source1.port = 1234
>>
>> agent1.sources.mycustom-source1.serialization_method = json
>>
>> #agent1.sources.mycustom-source1.schema_filepath >> /home/ubuntu/Software/flume/trunk/conf/AvroEventSchema.avpr
>>
>>
>>
>> # Define an HDFS sink
>>
>> agent1.sinks.hdfs-sink1.channel = ch1
>>
>> agent1.sinks.hdfs-sink1.type = hdfs
>>
>> agent1.sinks.hdfs-sink1.hdfs.path = hdfs://localhost:54310/user/flumeDump1
>>
>> agent1.sinks.hdfs-sink1.hdfs.filePrefix = events
>>
>> agent1.sinks.hdfs-sink1.hdfs.batchSize = 20000
>>
>> agent1.sinks.hdfs-sink1.hdfs.fileType = DataStream
>>
>> agent1.sinks.hdfs-sink1.hdfs.writeFormat = Text
>>
>> agent1.sinks.hdfs-sink1.hdfs.maxOpenFiles = 10000
>>
>> agent1.sinks.hdfs-sink1.hdfs.rollSize = 0
>>
>> agent1.sinks.hdfs-sink1.hdfs.rollInterval = 0
>>
>> agent1.sinks.hdfs-sink1.hdfs.rollCount = 20000
>>
>> agent1.sinks.hdfs-sink1.hdfs.hdfs.threadsPoolSize = 20
>>
>>
>>
>> agent1.sinks.hdfs-sink2.channel = ch2
>>
>> agent1.sinks.hdfs-sink2.type = hdfs
>>
>> agent1.sinks.hdfs-sink2.hdfs.path = hdfs://localhost:54310/user/flumeDump2
>>
>> agent1.sinks.hdfs-sink2.hdfs.filePrefix = events
>>
>> agent1.sinks.hdfs-sink2.hdfs.batchSize = 100000
>>
>> agent1.sinks.hdfs-sink2.hdfs.fileType = DataStream
>>
>> agent1.sinks.hdfs-sink2.hdfs.writeFormat = Text
>>
>> agent1.sinks.hdfs-sink2.hdfs.maxOpenFiles = 10000
>>
>> agent1.sinks.hdfs-sink2.hdfs.rollSize = 0
>>
>> agent1.sinks.hdfs-sink2.hdfs.rollInterval = 0
>>
>> agent1.sinks.hdfs-sink2.hdfs.rollCount = 100000
>>
>> agent1.sinks.hdfs-sink2.hdfs.hdfs.threadsPoolSize = 20
>>
>>
>>
>> agent1.sinks.hdfs-sink3.channel = ch3
>>
>> agent1.sinks.hdfs-sink3.type = hdfs
>>
>> agent1.sinks.hdfs-sink3.hdfs.path = hdfs://localhost:54310/user/flumeDump3
+
Harish Mandala 2012-09-26, 11:49