Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> HDFS Event Sink problems


+
Harish Mandala 2012-09-24, 22:01
+
Harish Mandala 2012-09-25, 19:17
Copy link to this message
-
Re: HDFS Event Sink problems
Harish,
What did you find on your side? Could it be related to
https://issues.apache.org/jira/browse/FLUME-1610 ? I am looking at that
issue right now.

Regards,
Mike

On Tue, Sep 25, 2012 at 12:17 PM, Harish Mandala <[EMAIL PROTECTED]>wrote:

> Thanks, but I understood why this is happening.
>
> On Mon, Sep 24, 2012 at 6:01 PM, Harish Mandala <[EMAIL PROTECTED]>wrote:
>
>> Hello,
>>
>>
>> I’m having some trouble with the HDFS Event Sink. I’m using the latest
>> version of flume NG, checked out today.
>>
>>
>> I am using curloader to hit “MycustomSource”, which essentially takes in
>> HTTP messages, and splits the content into 2 “kinds” of flume events
>> (differentiated by header key-value). The first kind is sent to hdfs-sink1,
>> and the second kind to hdfs-sink2 by a multiplexing selector as outlined in
>> the configuration below. There’s also an hdfs-sink3 which can be ignored at
>> present.
>>
>> I can’t really understand what’s going on. It seems related to some of
>> the race condition issues outlined here:
>>
>> https://issues.apache.org/jira/browse/FLUME-1219
>>
>>
>> Please let me know if you need more information.
>>
>>
>> The following is my conf file. It is followed by flume.log.
>>
>>
>> #### flume.conf ####
>>
>> agent1.channels = ch1 ch2 ch3
>>
>> agent1.sources = mycustom-source1
>>
>> agent1.sinks = hdfs-sink1 hdfs-sink2 hdfs-sink3
>>
>> # Define a memory channel called ch1 on agent1
>>
>> agent1.channels.ch1.type = memory
>>
>> agent1.channels.ch1.capacity = 200000
>>
>> agent1.channels.ch1.transactionCapacity = 20000
>>
>> agent1.channels.ch2.type = memory
>>
>> agent1.channels.ch2.capacity = 1000000
>>
>> agent1.channels.ch2.transactionCapacity = 100000
>>
>> agent1.channels.ch3.type = memory
>>
>> agent1.channels.ch3.capacity = 10000
>>
>> agent1.channels.ch3.transactionCapacity = 5000
>>
>>
>>
>> #agent1.channels.ch2.type = memory
>>
>> #agent1.channels.ch3.type = memory
>>
>>
>>
>> # Define an Mycustom custom source called mycustom-source1 on agent1 and
>> tell it
>>
>> # to bind to 0.0.0.0:41414. Connect it to channel ch1.
>>
>> agent1.sources.mycustom-source1.channels = ch1 ch2 ch3
>>
>> agent1.sources.mycustom-source1.type >> org.apache.flume.source.MycustomSource
>>
>> agent1.sources.mycustom-source1.bind = 127.0.0.1
>>
>> agent1.sources.mycustom-source1.port = 1234
>>
>> agent1.sources.mycustom-source1.serialization_method = json
>>
>> #agent1.sources.mycustom-source1.schema_filepath >> /home/ubuntu/Software/flume/trunk/conf/AvroEventSchema.avpr
>>
>>
>>
>> # Define an HDFS sink
>>
>> agent1.sinks.hdfs-sink1.channel = ch1
>>
>> agent1.sinks.hdfs-sink1.type = hdfs
>>
>> agent1.sinks.hdfs-sink1.hdfs.path = hdfs://localhost:54310/user/flumeDump1
>>
>> agent1.sinks.hdfs-sink1.hdfs.filePrefix = events
>>
>> agent1.sinks.hdfs-sink1.hdfs.batchSize = 20000
>>
>> agent1.sinks.hdfs-sink1.hdfs.fileType = DataStream
>>
>> agent1.sinks.hdfs-sink1.hdfs.writeFormat = Text
>>
>> agent1.sinks.hdfs-sink1.hdfs.maxOpenFiles = 10000
>>
>> agent1.sinks.hdfs-sink1.hdfs.rollSize = 0
>>
>> agent1.sinks.hdfs-sink1.hdfs.rollInterval = 0
>>
>> agent1.sinks.hdfs-sink1.hdfs.rollCount = 20000
>>
>> agent1.sinks.hdfs-sink1.hdfs.hdfs.threadsPoolSize = 20
>>
>>
>>
>> agent1.sinks.hdfs-sink2.channel = ch2
>>
>> agent1.sinks.hdfs-sink2.type = hdfs
>>
>> agent1.sinks.hdfs-sink2.hdfs.path = hdfs://localhost:54310/user/flumeDump2
>>
>> agent1.sinks.hdfs-sink2.hdfs.filePrefix = events
>>
>> agent1.sinks.hdfs-sink2.hdfs.batchSize = 100000
>>
>> agent1.sinks.hdfs-sink2.hdfs.fileType = DataStream
>>
>> agent1.sinks.hdfs-sink2.hdfs.writeFormat = Text
>>
>> agent1.sinks.hdfs-sink2.hdfs.maxOpenFiles = 10000
>>
>> agent1.sinks.hdfs-sink2.hdfs.rollSize = 0
>>
>> agent1.sinks.hdfs-sink2.hdfs.rollInterval = 0
>>
>> agent1.sinks.hdfs-sink2.hdfs.rollCount = 100000
>>
>> agent1.sinks.hdfs-sink2.hdfs.hdfs.threadsPoolSize = 20
>>
>>
>>
>> agent1.sinks.hdfs-sink3.channel = ch3
>>
>> agent1.sinks.hdfs-sink3.type = hdfs
>>
>> agent1.sinks.hdfs-sink3.hdfs.path = hdfs://localhost:54310/user/flumeDump3
+
Harish Mandala 2012-09-26, 11:49
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB