Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> HDFS SINK Performacne


+
Shara Shi 2012-08-27, 09:26
+
Denny Ye 2012-08-27, 12:04
+
Shara Shi 2012-08-28, 01:59
+
Denny Ye 2012-08-28, 03:02
+
Shara Shi 2012-08-28, 03:19
+
Mohit Anchlia 2012-08-28, 04:48
+
Shara Shi 2012-08-28, 05:08
Copy link to this message
-
Re: 答复: 答复: HDFS SINK Performacne
Hey,

Can you let us know what rate data is arriving at collector2 at? How
many events/second and bytes/second, roughly?

Also, why is your batch size so large? I'm not sure, but I think it
may wait until it has received batchSize events before it decides to
flush them to HDFS...  so this may create strange results depending on
how many events/second you have.

- Patrick

On Mon, Aug 27, 2012 at 9:48 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:
> Do you get better performance when you directly write to the cluster? Can
> you perform some tests writing to cluster directly and compare?
>
>
> On Mon, Aug 27, 2012 at 8:19 PM, Shara Shi <[EMAIL PROTECTED]> wrote:
>>
>> Hi Denny
>>
>>
>>
>> It is 20MB /min , I confirmed
>>
>> I sent data from avro-client from local to flume agent , I really got
>> 20MB/min
>>
>> So I try to find out the reason why.
>>
>>
>>
>> Regards
>>
>> Shara
>>
>> 发件人: Denny Ye [mailto:[EMAIL PROTECTED]]
>> 发送时间: 2012年8月28日 11:02
>> 收件人: [EMAIL PROTECTED]
>> 主题: Re: 答复: HDFS SINK Performacne
>>
>>
>>
>> 20MB/min or 20MB/sec?
>>
>> I doubt that it may have presentation mistake. Can you confirm it?
>>
>>
>>
>> -Regards
>>
>> Denny Ye
>>
>> 2012/8/28 Shara Shi <[EMAIL PROTECTED]>
>>
>> Hi Denny
>>
>>
>>
>> The throughput is 45MB/sec is OK for me .
>>
>> But I just got 20M / Minutes
>>
>> What’s wrong with my configuration?
>>
>>
>>
>> Regards
>>
>> Shara
>>
>>
>>
>>
>>
>> 发件人: Denny Ye [mailto:[EMAIL PROTECTED]]
>> 发送时间: 2012年8月27日 20:05
>> 收件人: [EMAIL PROTECTED]
>> 主题: Re: HDFS SINK Performacne
>>
>>
>>
>> hi Shara,
>>
>>     You are using MemoryChannel as repository. I tested it with outcomes:
>> 45MB/sec without full GC in local updated code. Is this your goal? or more
>> high throughput?
>>
>>
>>
>> -Regards
>>
>> Denny Ye
>>
>> 2012/8/27 Shara Shi <[EMAIL PROTECTED]>
>>
>> Hi All,
>>
>>
>>
>> Whatever I have tuned parameters of hdfs sink, It can’t get higher
>> performance over than 20MB per minutes.
>>
>> Is that normal? I think it is weird.
>>
>> How can I improve it
>>
>>
>>
>> Regards
>>
>> Ruihong Shi
>>
>> =========================================>>
>>
>>
>> # or more contributor license agreements.  See the NOTICE file
>>
>> # distributed with this work for additional information
>>
>> # regarding copyright ownership.  The ASF licenses this file
>>
>> # to you under the Apache License, Version 2.0 (the
>>
>> # "License"); you may not use this file except in compliance
>>
>> # with the License.  You may obtain a copy of the License at
>>
>> #
>>
>> #  http://www.apache.org/licenses/LICENSE-2.0
>>
>> #
>>
>> # Unless required by applicable law or agreed to in writing,
>>
>> # software distributed under the License is distributed on an
>>
>> # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
>>
>> # KIND, either express or implied.  See the License for the
>>
>> # specific language governing permissions and limitations
>>
>> # under the License.
>>
>>
>>
>> # Define a memory channel called ch1 on collector1
>>
>> collector2.channels.ch2.type = memory
>>
>> collector2.channels.ch2.capacity=500000
>>
>> collector2.channels.ch2.keep-alive=1
>>
>>
>>
>>
>>
>> # Define an Avro source called avro-source1 on agent1 and tell it
>>
>> # to bind to 0.0.0.0:41414. Connect it to channel ch1.
>>
>> collector2.sources.avro-source1.channels = ch2
>>
>> collector2.sources.avro-source1.type = avro
>>
>> collector2.sources.avro-source1.bind = 0.0.0.0
>>
>> collector2.sources.avro-source1.port = 41415
>>
>> collector2.sources.avro-soruce1.threads = 10
>>
>>
>>
>>
>>
>> # Define a hdfs sink
>>
>> collector2.sinks.hdfs.channel = ch2
>>
>> collector2.sinks.hdfs.type= hdfs
>>
>>
>> collector2.sinks.hdfs.hdfs.path=hdfs://namenode:8020/user/root/flume/webdata/exec/%Y/%m/%d/%H
>>
>> collector2.sinks.hdfs.batchsize=50000
>>
>> collector2.sinks.hdfs.runner.type=polling
>>
>> collector2.sinks.hdfs.runner.polling.interval = 1
>>
>> collector2.sinks.hdfs.hdfs.rollInterval = 120
>>
>> collector2.sinks.hdfs.hdfs.rollSize =0
+
Shara Shi 2012-08-28, 05:42
+
Brock Noland 2012-08-28, 11:47