Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> HDFS SINK Performacne


Copy link to this message
-
Re: 答复: 答复: HDFS SINK Performacne
Do you get better performance when you directly write to the cluster? Can
you perform some tests writing to cluster directly and compare?

On Mon, Aug 27, 2012 at 8:19 PM, Shara Shi <[EMAIL PROTECTED]> wrote:

>  Hi Denny****
>
> ** **
>
> It is 20MB /min , I confirmed ****
>
> I sent data from avro-client from local to flume agent , I really got
> 20MB/min****
>
> So I try to find out the reason why. ****
>
> ** **
>
> Regards ****
>
> Shara****
>
> *发件人:* Denny Ye [mailto:[EMAIL PROTECTED]]
> *发送时间:* 2012年8月28日 11:02
> *收件人:* [EMAIL PROTECTED]
> *主题:* Re: 答复: HDFS SINK Performacne****
>
>  ** **
>
> 20MB/min or 20MB/sec?****
>
> I doubt that it may have presentation mistake. Can you confirm it?****
>
> ** **
>
> -Regards****
>
> Denny Ye****
>
> 2012/8/28 Shara Shi <[EMAIL PROTECTED]>****
>
> Hi Denny****
>
>  ****
>
> The throughput is 45MB/sec is OK for me . ****
>
> But I just got 20M / Minutes ****
>
> What’s wrong with my configuration?****
>
>  ****
>
> Regards****
>
> Shara****
>
>  ****
>
>  ****
>
> *发件人:* Denny Ye [mailto:[EMAIL PROTECTED]]
> *发送时间:* 2012年8月27日 20:05
> *收件人:* [EMAIL PROTECTED]
> *主题:* Re: HDFS SINK Performacne****
>
>  ****
>
> hi Shara,****
>
>     You are using MemoryChannel as repository. I tested it with outcomes:
> 45MB/sec without full GC in local updated code. Is this your goal? or more
> high throughput?****
>
>  ****
>
> -Regards****
>
> Denny Ye****
>
> 2012/8/27 Shara Shi <[EMAIL PROTECTED]>****
>
> Hi All, ****
>
>  ****
>
> Whatever I have tuned parameters of hdfs sink, It can’t get higher
> performance over than 20MB per minutes.****
>
> Is that normal? I think it is weird.****
>
> How can I improve it****
>
>  ****
>
> Regards****
>
> Ruihong Shi****
>
> ==========================================****
>
>  ****
>
> # or more contributor license agreements.  See the NOTICE file****
>
> # distributed with this work for additional information****
>
> # regarding copyright ownership.  The ASF licenses this file****
>
> # to you under the Apache License, Version 2.0 (the****
>
> # "License"); you may not use this file except in compliance****
>
> # with the License.  You may obtain a copy of the License at****
>
> #****
>
> #  http://www.apache.org/licenses/LICENSE-2.0****
>
> #****
>
> # Unless required by applicable law or agreed to in writing,****
>
> # software distributed under the License is distributed on an****
>
> # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY****
>
> # KIND, either express or implied.  See the License for the****
>
> # specific language governing permissions and limitations****
>
> # under the License.****
>
>  ****
>
> # Define a memory channel called ch1 on collector1****
>
> collector2.channels.ch2.type = memory****
>
> collector2.channels.ch2.capacity=500000****
>
> collector2.channels.ch2.keep-alive=1****
>
>  ****
>
>  ****
>
> # Define an Avro source called avro-source1 on agent1 and tell it****
>
> # to bind to 0.0.0.0:41414. Connect it to channel ch1.****
>
> collector2.sources.avro-source1.channels = ch2****
>
> collector2.sources.avro-source1.type = avro****
>
> collector2.sources.avro-source1.bind = 0.0.0.0****
>
> collector2.sources.avro-source1.port = 41415****
>
> collector2.sources.avro-soruce1.threads = 10****
>
>  ****
>
>  ****
>
> # Define a hdfs sink****
>
> collector2.sinks.hdfs.channel = ch2****
>
> collector2.sinks.hdfs.type= hdfs****
>
>
> collector2.sinks.hdfs.hdfs.path=hdfs://namenode:8020/user/root/flume/webdata/exec/%Y/%m/%d/%H
> ****
>
> collector2.sinks.hdfs.batchsize=50000****
>
> collector2.sinks.hdfs.runner.type=polling****
>
> collector2.sinks.hdfs.runner.polling.interval = 1****
>
> collector2.sinks.hdfs.hdfs.rollInterval = 120****
>
> collector2.sinks.hdfs.hdfs.rollSize =0****
>
> collector2.sinks.hdfs.hdfs.rollCount = 300000****
>
> collector2.sinks.hdfs.hdfs.fileType=DataStream****
>
> collector2.sinks.hdfs.hdfs.round =true****
>
> collector2.sinks.hdfs.hdfs.roundValue = 10****
>
> collector2.sinks.hdfs.hdfs.roundUnit = minute****
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB