Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> [DISCUSS] Feature bloat and contrib module

Copy link to this message
Re: [DISCUSS] Feature bloat and contrib module

IMHO, contrib modules seem better for the following reasons:
1. Keep the core as thin as possible. I like the idea of a pluggeable Flume
where the user adds the components needed, and only these. I imagine that
realistically, most users only use a handful of components, and therefore
don't need the whole library of every existing (or supported) sink and
source on their localhost. We could make the process of adding/removing
components easier, so that it becomes trivial for the user to
download/install/activate them.
2. License considerations. I can envision cases where one would want to
integrate Flume with another system that uses a license that's not
compatible with Apache's. So whether a contributor needs or wants to use a
different license, this contribution cannot currently be added to Flume. I'm
not an expert on licenses, but I wonder if it would be possible to include
these contributions using a contrib module.
3. Easiest way in. "Getting in" becomes trivial and open to all. Seems to me
like the best way to grow the project.
4. Community-based out. With a contrib project, we actually don't really
need to move contributions out. The community, if able to vote or report
usage naturally manages which contributions are used and which aren't.
5. Competition and maintenance. As software engineers, there are always
tradeoffs we need to make. Imagine a component that could have its
performances increased at the cost of, for example, compatibility with some
other systems. Why would this optimization have to conflict with Apache's
main component? Couldn't both live side-by-side, and let the user choose the
one that better fit his/her specific context and requirements?
So to answer the original discussion questions: I'd argue that contrib
modules would benefit Flume, that they should be released on their own
schedule, supported independently, and be compatible with whatever version
of Flume the authors wish.
I like cathedrals, and I tend to design my applications like that. But in
this case, I believe a little bit of bazaar would be best.
I hope this helps.


From:  Bruno Mahé <[EMAIL PROTECTED]>
Date:  Saturday, December 21, 2013 4:29 PM
Subject:  Re: [DISCUSS] Feature bloat and contrib module

See inline.

On 12/20/2013 04:01 PM, Mike Percy wrote:
>  On Mon, Dec 16, 2013 at 11:34 PM, Bruno Mahé <[EMAIL PROTECTED]> wrote:
>>  Summarizing my suggestions:
>>  * Commiters are not the sole developers. There is no reason for commiters
>>  to take all these responsibilities on their shoulders. Also developer !>>  commiter.
>>  * Easy IN, Easy OUT. If no one volunteers to maintain something, then
>>  there is no reason to keep it since the community is not interested in it
>>  anyway.
>>  * Easy to get in means more contributions and more contributors. Also a
>>  way to grow community and have contributors becoming full commiters. It is
>>  more than likely they will notice things that can be improved elsewhere and
>>  start being more active overall.
>>  * Easy to get out means only the maintained stuff stays. Stuff would most
>>  likely get kicked out before a feature release (ex: 1.5 vs 1.6). Bug fix
>>  releases have no reason to kick out components since they are unlikely to
>>  break in between bug fix releases (ex: 1.5.2 vs 1.5.3).
>>  * Spreading sources and sinks is going to be quite hard on users. This
>>  would means users would have to be developers themselves since they would
>>  have to:
>>       - Find the source/sink on some random repository which may or may not
>>  be maintained. Pick one of the repository out of all the ones the user has
>>  found
>>       - Build it against their own version of Apache Flume (Apache, CDH,
>>  PHD, HDP...)
>>       - Resolve dependencies and build issues between their version of
>>  Apache Flume and source/sink since the source/sink may or may not have been
>>  maintained
>>       - Qualify the integration between their version of Apache Flume and

First of all, it is very hard to quantify users. And also they tend to
be silent if everything goes well.
Also I don't see the issue if just one person is using something. If
such component is maintained and does not create burden, why removing
it? Keeping it would be easy and make everyone happy.

The way I would see components being removed would rather be based on
their maintenance cost. For instance, is it blocking a release? Or have
all its tests failed for the past few weeks and no one care?

Regarding compatibility between versions, I would say:
* You don't have to remove the component as soon as you create the
ticket to remove it. This way you give enough opportunity for some
people to step up and take over before kicking it out.

* Manage expectations by adding labels to the components. When a
component is introduced, label it as "experimental", then a few release
later "beta" and then a few releases later "stable". Note that some
other criteria can be added to the labeling but this gives the
opportunity to announce that experimental components may not survive the
next release. This way you can ensure that stable components remain
backward compatible while giving you the option to remove the
unmaintained/unstable ones.

Yeah this is just on a related note from a user experience.
This would keep Apache Flume installation lean and to the point. It's
more about tailoring the installation to the need than to solve
dependencies issues.

All GNU/Linux distributions and even Apache Bigtop face that very same
issue. And the right answer would be to fix the issue upstream or to use
some of the tricks you cite above. And it's also pretty abstract without
concrete cases. So can you point to tickets or describe more the issue
between Apache Solr and Elasticsearch? Maybe we can use that to derive a

Dependencies conflicts should still be pretty rare though. So I would
not throw out the baby with the bat
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB