Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - HDFS SINK Performacne


+
Shara Shi 2012-08-27, 09:26
+
Denny Ye 2012-08-27, 12:04
+
Shara Shi 2012-08-28, 01:59
+
Denny Ye 2012-08-28, 03:02
+
Shara Shi 2012-08-28, 03:19
Copy link to this message
-
Re: 答复: 答复: HDFS SINK Performacne
Mohit Anchlia 2012-08-28, 04:48
Do you get better performance when you directly write to the cluster? Can
you perform some tests writing to cluster directly and compare?

On Mon, Aug 27, 2012 at 8:19 PM, Shara Shi <[EMAIL PROTECTED]> wrote:

>  Hi Denny****
>
> ** **
>
> It is 20MB /min , I confirmed ****
>
> I sent data from avro-client from local to flume agent , I really got
> 20MB/min****
>
> So I try to find out the reason why. ****
>
> ** **
>
> Regards ****
>
> Shara****
>
> *发件人:* Denny Ye [mailto:[EMAIL PROTECTED]]
> *发送时间:* 2012年8月28日 11:02
> *收件人:* [EMAIL PROTECTED]
> *主题:* Re: 答复: HDFS SINK Performacne****
>
>  ** **
>
> 20MB/min or 20MB/sec?****
>
> I doubt that it may have presentation mistake. Can you confirm it?****
>
> ** **
>
> -Regards****
>
> Denny Ye****
>
> 2012/8/28 Shara Shi <[EMAIL PROTECTED]>****
>
> Hi Denny****
>
>  ****
>
> The throughput is 45MB/sec is OK for me . ****
>
> But I just got 20M / Minutes ****
>
> What’s wrong with my configuration?****
>
>  ****
>
> Regards****
>
> Shara****
>
>  ****
>
>  ****
>
> *发件人:* Denny Ye [mailto:[EMAIL PROTECTED]]
> *发送时间:* 2012年8月27日 20:05
> *收件人:* [EMAIL PROTECTED]
> *主题:* Re: HDFS SINK Performacne****
>
>  ****
>
> hi Shara,****
>
>     You are using MemoryChannel as repository. I tested it with outcomes:
> 45MB/sec without full GC in local updated code. Is this your goal? or more
> high throughput?****
>
>  ****
>
> -Regards****
>
> Denny Ye****
>
> 2012/8/27 Shara Shi <[EMAIL PROTECTED]>****
>
> Hi All, ****
>
>  ****
>
> Whatever I have tuned parameters of hdfs sink, It can’t get higher
> performance over than 20MB per minutes.****
>
> Is that normal? I think it is weird.****
>
> How can I improve it****
>
>  ****
>
> Regards****
>
> Ruihong Shi****
>
> ==========================================****
>
>  ****
>
> # or more contributor license agreements.  See the NOTICE file****
>
> # distributed with this work for additional information****
>
> # regarding copyright ownership.  The ASF licenses this file****
>
> # to you under the Apache License, Version 2.0 (the****
>
> # "License"); you may not use this file except in compliance****
>
> # with the License.  You may obtain a copy of the License at****
>
> #****
>
> #  http://www.apache.org/licenses/LICENSE-2.0****
>
> #****
>
> # Unless required by applicable law or agreed to in writing,****
>
> # software distributed under the License is distributed on an****
>
> # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY****
>
> # KIND, either express or implied.  See the License for the****
>
> # specific language governing permissions and limitations****
>
> # under the License.****
>
>  ****
>
> # Define a memory channel called ch1 on collector1****
>
> collector2.channels.ch2.type = memory****
>
> collector2.channels.ch2.capacity=500000****
>
> collector2.channels.ch2.keep-alive=1****
>
>  ****
>
>  ****
>
> # Define an Avro source called avro-source1 on agent1 and tell it****
>
> # to bind to 0.0.0.0:41414. Connect it to channel ch1.****
>
> collector2.sources.avro-source1.channels = ch2****
>
> collector2.sources.avro-source1.type = avro****
>
> collector2.sources.avro-source1.bind = 0.0.0.0****
>
> collector2.sources.avro-source1.port = 41415****
>
> collector2.sources.avro-soruce1.threads = 10****
>
>  ****
>
>  ****
>
> # Define a hdfs sink****
>
> collector2.sinks.hdfs.channel = ch2****
>
> collector2.sinks.hdfs.type= hdfs****
>
>
> collector2.sinks.hdfs.hdfs.path=hdfs://namenode:8020/user/root/flume/webdata/exec/%Y/%m/%d/%H
> ****
>
> collector2.sinks.hdfs.batchsize=50000****
>
> collector2.sinks.hdfs.runner.type=polling****
>
> collector2.sinks.hdfs.runner.polling.interval = 1****
>
> collector2.sinks.hdfs.hdfs.rollInterval = 120****
>
> collector2.sinks.hdfs.hdfs.rollSize =0****
>
> collector2.sinks.hdfs.hdfs.rollCount = 300000****
>
> collector2.sinks.hdfs.hdfs.fileType=DataStream****
>
> collector2.sinks.hdfs.hdfs.round =true****
>
> collector2.sinks.hdfs.hdfs.roundValue = 10****
>
> collector2.sinks.hdfs.hdfs.roundUnit = minute****
+
Shara Shi 2012-08-28, 05:08
+
Patrick Wendell 2012-08-28, 05:11
+
Shara Shi 2012-08-28, 05:42
+
Brock Noland 2012-08-28, 11:47