|
Mike Percy
2013-01-11, 20:13
Brock Noland
2013-01-11, 20:18
Mike Percy
2013-01-11, 20:45
Mohammad Tariq
2013-01-11, 20:48
Xu
2013-01-11, 20:59
Mike Percy
2013-01-11, 22:27
Alexander Alten-Lorenz
2013-01-12, 08:44
Mohit Anchlia
2013-01-15, 17:49
|
-
New blog post on Flume performance tuningMike Percy 2013-01-11, 20:13
Hi folks,
I just posted to the Apache blog on how to do performance tuning with Flume. I plan on following it up with a post about using the Flume monitoring capabilities while tuning. Feedback is welcome. https://blogs.apache.org/flume/entry/flume_performance_tuning_part_1 Regards, Mike
-
Re: New blog post on Flume performance tuningBrock Noland 2013-01-11, 20:18
Nice post!
On Fri, Jan 11, 2013 at 12:13 PM, Mike Percy <[EMAIL PROTECTED]> wrote: > Hi folks, > I just posted to the Apache blog on how to do performance tuning with Flume. > I plan on following it up with a post about using the Flume monitoring > capabilities while tuning. Feedback is welcome. > > https://blogs.apache.org/flume/entry/flume_performance_tuning_part_1 > > Regards, > Mike > -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
-
Re: New blog post on Flume performance tuningMike Percy 2013-01-11, 20:45
Thanks Brock! I've been working on this, off and on, for a while. :)
On Fri, Jan 11, 2013 at 12:18 PM, Brock Noland <[EMAIL PROTECTED]> wrote: > Nice post! > > On Fri, Jan 11, 2013 at 12:13 PM, Mike Percy <[EMAIL PROTECTED]> wrote: > > Hi folks, > > I just posted to the Apache blog on how to do performance tuning with > Flume. > > I plan on following it up with a post about using the Flume monitoring > > capabilities while tuning. Feedback is welcome. > > > > https://blogs.apache.org/flume/entry/flume_performance_tuning_part_1 > > > > Regards, > > Mike > > > > > > -- > Apache MRUnit - Unit testing MapReduce - > http://incubator.apache.org/mrunit/ >
-
Re: New blog post on Flume performance tuningMohammad Tariq 2013-01-11, 20:48
+1
Thank you so much Mike, for all the good work. Warm Regards, Tariq https://mtariq.jux.com/ On Sat, Jan 12, 2013 at 2:15 AM, Mike Percy <[EMAIL PROTECTED]> wrote: > Thanks Brock! I've been working on this, off and on, for a while. :) > > > On Fri, Jan 11, 2013 at 12:18 PM, Brock Noland <[EMAIL PROTECTED]> wrote: > >> Nice post! >> >> On Fri, Jan 11, 2013 at 12:13 PM, Mike Percy <[EMAIL PROTECTED]> wrote: >> > Hi folks, >> > I just posted to the Apache blog on how to do performance tuning with >> Flume. >> > I plan on following it up with a post about using the Flume monitoring >> > capabilities while tuning. Feedback is welcome. >> > >> > https://blogs.apache.org/flume/entry/flume_performance_tuning_part_1 >> > >> > Regards, >> > Mike >> > >> >> >> >> -- >> Apache MRUnit - Unit testing MapReduce - >> http://incubator.apache.org/mrunit/ >> > >
-
Re: New blog post on Flume performance tuningXu 2013-01-11, 20:59
Great post, Mike!
One question if you can either address via mailing list or future posts... I am curious about how to remove duplicated messages in this flow. For example, when I set up a switch/router to send syslog messages, I'd like to send two syslog collectors or two flume agents. In this case, the switch/router is just a dumb device, not knowing how to fail-over or load-balance. As a result, we have two copies of the same message going into flume. I have seen people describing doing hbase operations to remove duplicates, but I am wondering if we can do anything in the flume infrastructure. Thanks. -Simon On Fri, Jan 11, 2013 at 3:48 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: > +1 > > Thank you so much Mike, for all the good work. > > Warm Regards, > Tariq > https://mtariq.jux.com/ > > > On Sat, Jan 12, 2013 at 2:15 AM, Mike Percy <[EMAIL PROTECTED]> wrote: >> >> Thanks Brock! I've been working on this, off and on, for a while. :) >> >> >> On Fri, Jan 11, 2013 at 12:18 PM, Brock Noland <[EMAIL PROTECTED]> wrote: >>> >>> Nice post! >>> >>> On Fri, Jan 11, 2013 at 12:13 PM, Mike Percy <[EMAIL PROTECTED]> wrote: >>> > Hi folks, >>> > I just posted to the Apache blog on how to do performance tuning with >>> > Flume. >>> > I plan on following it up with a post about using the Flume monitoring >>> > capabilities while tuning. Feedback is welcome. >>> > >>> > https://blogs.apache.org/flume/entry/flume_performance_tuning_part_1 >>> > >>> > Regards, >>> > Mike >>> > >>> >>> >>> >>> -- >>> Apache MRUnit - Unit testing MapReduce - >>> http://incubator.apache.org/mrunit/ >> >> >
-
Re: New blog post on Flume performance tuningMike Percy 2013-01-11, 22:27
Hi Simon,
There is no good way that I am aware of for Flume to dedup messages. This is because there is no abstraction for doing pairwise comparison of events, and, as you scale up, maintaining some kind of hash table of processed events generally becomes prohibitive or makes it not worth the effort at the streaming layer. The most straightforward way to dedup Flume events is to tag them with some kind of unique ID at event creation time. Then you can dedup with a MapReduce job (in the case of writing to HDFS) or by making your operations idempotent (in the case, for example, of writing keys to HBase). Regards, Mike On Fri, Jan 11, 2013 at 12:59 PM, Xu (Simon) Chen <[EMAIL PROTECTED]> wrote: > Great post, Mike! > > One question if you can either address via mailing list or future posts... > > I am curious about how to remove duplicated messages in this flow. For > example, when I set up a switch/router to send syslog messages, I'd > like to send two syslog collectors or two flume agents. In this case, > the switch/router is just a dumb device, not knowing how to fail-over > or load-balance. As a result, we have two copies of the same message > going into flume. > > I have seen people describing doing hbase operations to remove > duplicates, but I am wondering if we can do anything in the flume > infrastructure. > > Thanks. > -Simon > > On Fri, Jan 11, 2013 at 3:48 PM, Mohammad Tariq <[EMAIL PROTECTED]> > wrote: > > +1 > > > > Thank you so much Mike, for all the good work. > > > > Warm Regards, > > Tariq > > https://mtariq.jux.com/ > > > > > > On Sat, Jan 12, 2013 at 2:15 AM, Mike Percy <[EMAIL PROTECTED]> wrote: > >> > >> Thanks Brock! I've been working on this, off and on, for a while. :) > >> > >> > >> On Fri, Jan 11, 2013 at 12:18 PM, Brock Noland <[EMAIL PROTECTED]> > wrote: > >>> > >>> Nice post! > >>> > >>> On Fri, Jan 11, 2013 at 12:13 PM, Mike Percy <[EMAIL PROTECTED]> > wrote: > >>> > Hi folks, > >>> > I just posted to the Apache blog on how to do performance tuning with > >>> > Flume. > >>> > I plan on following it up with a post about using the Flume > monitoring > >>> > capabilities while tuning. Feedback is welcome. > >>> > > >>> > https://blogs.apache.org/flume/entry/flume_performance_tuning_part_1 > >>> > > >>> > Regards, > >>> > Mike > >>> > > >>> > >>> > >>> > >>> -- > >>> Apache MRUnit - Unit testing MapReduce - > >>> http://incubator.apache.org/mrunit/ > >> > >> > > >
-
Re: New blog post on Flume performance tuningAlexander Alten-Lorenz 2013-01-12, 08:44
Great post, +1 man!
On Jan 11, 2013, at 9:13 PM, Mike Percy <[EMAIL PROTECTED]> wrote: > Hi folks, > I just posted to the Apache blog on how to do performance tuning with > Flume. I plan on following it up with a post about using the Flume > monitoring capabilities while tuning. Feedback is welcome. > > https://blogs.apache.org/flume/entry/flume_performance_tuning_part_1 > > Regards, > Mike -- Alexander Alten-Lorenz http://mapredit.blogspot.com German Hadoop LinkedIn Group: http://goo.gl/N8pCF
-
Re: New blog post on Flume performance tuningMohit Anchlia 2013-01-15, 17:49
We have memoryChannel capacity set to 10000 and transactionSize set to 500.
This gives us good performance. I am trying to understand what's the downside of using this value. Is it ok to have it so large? webanalytics.channels.memoryChannel.capacity = 10000 webanalytics.channels.memoryChannel.transactionCapacity = 500 On Sat, Jan 12, 2013 at 12:44 AM, Alexander Alten-Lorenz < [EMAIL PROTECTED]> wrote: > Great post, +1 man! > > On Jan 11, 2013, at 9:13 PM, Mike Percy <[EMAIL PROTECTED]> wrote: > > > Hi folks, > > I just posted to the Apache blog on how to do performance tuning with > > Flume. I plan on following it up with a post about using the Flume > > monitoring capabilities while tuning. Feedback is welcome. > > > > https://blogs.apache.org/flume/entry/flume_performance_tuning_part_1 > > > > Regards, > > Mike > > -- > Alexander Alten-Lorenz > http://mapredit.blogspot.com > German Hadoop LinkedIn Group: http://goo.gl/N8pCF > > |