Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - HDFS SINK Performacne


Copy link to this message
-
Re: HDFS SINK Performacne
Denny Ye 2012-08-27, 12:04
hi Shara,
    You are using MemoryChannel as repository. I tested it with outcomes:
45MB/sec without full GC in local updated code. Is this your goal? or more
high throughput?

-Regards
Denny Ye

2012/8/27 Shara Shi <[EMAIL PROTECTED]>

> Hi All, ****
>
> ** **
>
> Whatever I have tuned parameters of hdfs sink, It can’t get higher
> performance over than 20MB per minutes.****
>
> Is that normal? I think it is weird.****
>
> How can I improve it****
>
> ** **
>
> Regards****
>
> Ruihong Shi****
>
> ==========================================****
>
> ** **
>
> # or more contributor license agreements.  See the NOTICE file****
>
> # distributed with this work for additional information****
>
> # regarding copyright ownership.  The ASF licenses this file****
>
> # to you under the Apache License, Version 2.0 (the****
>
> # "License"); you may not use this file except in compliance****
>
> # with the License.  You may obtain a copy of the License at****
>
> #****
>
> #  http://www.apache.org/licenses/LICENSE-2.0****
>
> #****
>
> # Unless required by applicable law or agreed to in writing,****
>
> # software distributed under the License is distributed on an****
>
> # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY****
>
> # KIND, either express or implied.  See the License for the****
>
> # specific language governing permissions and limitations****
>
> # under the License.****
>
> ** **
>
> # Define a memory channel called ch1 on collector1****
>
> collector2.channels.ch2.type = memory****
>
> collector2.channels.ch2.capacity=500000****
>
> collector2.channels.ch2.keep-alive=1****
>
> ** **
>
> ** **
>
> # Define an Avro source called avro-source1 on agent1 and tell it****
>
> # to bind to 0.0.0.0:41414. Connect it to channel ch1.****
>
> collector2.sources.avro-source1.channels = ch2****
>
> collector2.sources.avro-source1.type = avro****
>
> collector2.sources.avro-source1.bind = 0.0.0.0****
>
> collector2.sources.avro-source1.port = 41415****
>
> collector2.sources.avro-soruce1.threads = 10****
>
> ** **
>
> ** **
>
> # Define a hdfs sink****
>
> collector2.sinks.hdfs.channel = ch2****
>
> collector2.sinks.hdfs.type= hdfs****
>
>
> collector2.sinks.hdfs.hdfs.path=hdfs://namenode:8020/user/root/flume/webdata/exec/%Y/%m/%d/%H
> ****
>
> collector2.sinks.hdfs.batchsize=50000****
>
> collector2.sinks.hdfs.runner.type=polling****
>
> collector2.sinks.hdfs.runner.polling.interval = 1****
>
> collector2.sinks.hdfs.hdfs.rollInterval = 120****
>
> collector2.sinks.hdfs.hdfs.rollSize =0****
>
> collector2.sinks.hdfs.hdfs.rollCount = 300000****
>
> collector2.sinks.hdfs.hdfs.fileType=DataStream****
>
> collector2.sinks.hdfs.hdfs.round =true****
>
> collector2.sinks.hdfs.hdfs.roundValue = 10****
>
> collector2.sinks.hdfs.hdfs.roundUnit = minute****
>
> collector2.sinks.hdfs.hdfs.threadsPoolSize = 10****
>
> collector2.sinks.hdfs.hdfs.rollTimerPoolSize = 10****
>
> ** **
>
> # Finally, now that we've defined all of our components, tell****
>
> # agent1 which ones we want to activate.****
>