|
|
+
Shara Shi 2012-08-27, 09:26
+
Denny Ye 2012-08-27, 12:04
+
Shara Shi 2012-08-28, 01:59
+
Denny Ye 2012-08-28, 03:02
+
Shara Shi 2012-08-28, 03:19
+
Mohit Anchlia 2012-08-28, 04:48
+
Shara Shi 2012-08-28, 05:08
-
Re: 答复: 答复: HDFS SINK PerformacnePatrick Wendell 2012-08-28, 05:11
Hey,
Can you let us know what rate data is arriving at collector2 at? How many events/second and bytes/second, roughly? Also, why is your batch size so large? I'm not sure, but I think it may wait until it has received batchSize events before it decides to flush them to HDFS... so this may create strange results depending on how many events/second you have. - Patrick On Mon, Aug 27, 2012 at 9:48 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: > Do you get better performance when you directly write to the cluster? Can > you perform some tests writing to cluster directly and compare? > > > On Mon, Aug 27, 2012 at 8:19 PM, Shara Shi <[EMAIL PROTECTED]> wrote: >> >> Hi Denny >> >> >> >> It is 20MB /min , I confirmed >> >> I sent data from avro-client from local to flume agent , I really got >> 20MB/min >> >> So I try to find out the reason why. >> >> >> >> Regards >> >> Shara >> >> 发件人: Denny Ye [mailto:[EMAIL PROTECTED]] >> 发送时间: 2012年8月28日 11:02 >> 收件人: [EMAIL PROTECTED] >> 主题: Re: 答复: HDFS SINK Performacne >> >> >> >> 20MB/min or 20MB/sec? >> >> I doubt that it may have presentation mistake. Can you confirm it? >> >> >> >> -Regards >> >> Denny Ye >> >> 2012/8/28 Shara Shi <[EMAIL PROTECTED]> >> >> Hi Denny >> >> >> >> The throughput is 45MB/sec is OK for me . >> >> But I just got 20M / Minutes >> >> What’s wrong with my configuration? >> >> >> >> Regards >> >> Shara >> >> >> >> >> >> 发件人: Denny Ye [mailto:[EMAIL PROTECTED]] >> 发送时间: 2012年8月27日 20:05 >> 收件人: [EMAIL PROTECTED] >> 主题: Re: HDFS SINK Performacne >> >> >> >> hi Shara, >> >> You are using MemoryChannel as repository. I tested it with outcomes: >> 45MB/sec without full GC in local updated code. Is this your goal? or more >> high throughput? >> >> >> >> -Regards >> >> Denny Ye >> >> 2012/8/27 Shara Shi <[EMAIL PROTECTED]> >> >> Hi All, >> >> >> >> Whatever I have tuned parameters of hdfs sink, It can’t get higher >> performance over than 20MB per minutes. >> >> Is that normal? I think it is weird. >> >> How can I improve it >> >> >> >> Regards >> >> Ruihong Shi >> >> =========================================>> >> >> >> # or more contributor license agreements. See the NOTICE file >> >> # distributed with this work for additional information >> >> # regarding copyright ownership. The ASF licenses this file >> >> # to you under the Apache License, Version 2.0 (the >> >> # "License"); you may not use this file except in compliance >> >> # with the License. You may obtain a copy of the License at >> >> # >> >> # http://www.apache.org/licenses/LICENSE-2.0 >> >> # >> >> # Unless required by applicable law or agreed to in writing, >> >> # software distributed under the License is distributed on an >> >> # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY >> >> # KIND, either express or implied. See the License for the >> >> # specific language governing permissions and limitations >> >> # under the License. >> >> >> >> # Define a memory channel called ch1 on collector1 >> >> collector2.channels.ch2.type = memory >> >> collector2.channels.ch2.capacity=500000 >> >> collector2.channels.ch2.keep-alive=1 >> >> >> >> >> >> # Define an Avro source called avro-source1 on agent1 and tell it >> >> # to bind to 0.0.0.0:41414. Connect it to channel ch1. >> >> collector2.sources.avro-source1.channels = ch2 >> >> collector2.sources.avro-source1.type = avro >> >> collector2.sources.avro-source1.bind = 0.0.0.0 >> >> collector2.sources.avro-source1.port = 41415 >> >> collector2.sources.avro-soruce1.threads = 10 >> >> >> >> >> >> # Define a hdfs sink >> >> collector2.sinks.hdfs.channel = ch2 >> >> collector2.sinks.hdfs.type= hdfs >> >> >> collector2.sinks.hdfs.hdfs.path=hdfs://namenode:8020/user/root/flume/webdata/exec/%Y/%m/%d/%H >> >> collector2.sinks.hdfs.batchsize=50000 >> >> collector2.sinks.hdfs.runner.type=polling >> >> collector2.sinks.hdfs.runner.polling.interval = 1 >> >> collector2.sinks.hdfs.hdfs.rollInterval = 120 >> >> collector2.sinks.hdfs.hdfs.rollSize =0 +
Shara Shi 2012-08-28, 05:42
+
Brock Noland 2012-08-28, 11:47
|