|
|
-
[DISCUSS] Annotating interface scope/stability in Flume
Mike Percy 2012-08-14, 09:38
It seems we have reached a point in some of the Flume components where we want to add features but that means adding new interfaces to maintain backwards compatibility. While this is natural, the more we do it, and the more we cast from interface to interface, the less readable and consistent the codebase becomes. Also, we have exposed much of our code as public class + public method, even when APIs may not be intended as stable extension points, for testing or other reasons. A few years ago, Hadoop faced this problem and ended up implementing annotations to document APIs as @Stable/@Evolving, @Public/@Limited/@Private. See < https://issues.apache.org/jira/browse/HADOOP-5073> for the history on that. I would like to propose the adoption of a similar mechanism in Flume, in order to give us more wiggle room in the future for evolutionary development. Thoughts? Right now, I feel we would get most of the bang for the buck simply by adding two annotations: @Public and @Internal, which to me means "you can subclass or instantiate this directly", or "you can't directly use this if you expect future compatibility". Regards, Mike
-
Re: [DISCUSS] Annotating interface scope/stability in Flume
Brock Noland 2012-08-14, 10:40
Hi, I think this makes sense. Can we clearly state and define a contract for Public and Internal? Brock On Tue, Aug 14, 2012 at 10:38 AM, Mike Percy <[EMAIL PROTECTED]> wrote: > It seems we have reached a point in some of the Flume components where we > want to add features but that means adding new interfaces to maintain > backwards compatibility. While this is natural, the more we do it, and the > more we cast from interface to interface, the less readable and consistent > the codebase becomes. Also, we have exposed much of our code as public > class + public method, even when APIs may not be intended as stable > extension points, for testing or other reasons. A few years ago, Hadoop > faced this problem and ended up implementing annotations to document APIs > as @Stable/@Evolving, @Public/@Limited/@Private. See < > https://issues.apache.org/jira/browse/HADOOP-5073> for the history on > that. > > I would like to propose the adoption of a similar mechanism in Flume, in > order to give us more wiggle room in the future for evolutionary > development. Thoughts? > > Right now, I feel we would get most of the bang for the buck simply by > adding two annotations: @Public and @Internal, which to me means "you can > subclass or instantiate this directly", or "you can't directly use this if > you expect future compatibility". > > Regards, > Mike > -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
-
Re: [DISCUSS] Annotating interface scope/stability in Flume
Jarek Jarcec Cecho 2012-08-14, 10:49
I'm also in favour of that idea. Jarcec On Tue, Aug 14, 2012 at 11:40:17AM +0100, Brock Noland wrote: > Hi, > > I think this makes sense. Can we clearly state and define a contract for > Public and Internal? > > Brock > > On Tue, Aug 14, 2012 at 10:38 AM, Mike Percy <[EMAIL PROTECTED]> wrote: > > > It seems we have reached a point in some of the Flume components where we > > want to add features but that means adding new interfaces to maintain > > backwards compatibility. While this is natural, the more we do it, and the > > more we cast from interface to interface, the less readable and consistent > > the codebase becomes. Also, we have exposed much of our code as public > > class + public method, even when APIs may not be intended as stable > > extension points, for testing or other reasons. A few years ago, Hadoop > > faced this problem and ended up implementing annotations to document APIs > > as @Stable/@Evolving, @Public/@Limited/@Private. See < > > https://issues.apache.org/jira/browse/HADOOP-5073> for the history on > > that. > > > > I would like to propose the adoption of a similar mechanism in Flume, in > > order to give us more wiggle room in the future for evolutionary > > development. Thoughts? > > > > Right now, I feel we would get most of the bang for the buck simply by > > adding two annotations: @Public and @Internal, which to me means "you can > > subclass or instantiate this directly", or "you can't directly use this if > > you expect future compatibility". > > > > Regards, > > Mike > > > > > > -- > Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
-
Re: [DISCUSS] Annotating interface scope/stability in Flume
Ralph Goers 2012-08-14, 16:14
I have a slightly different take on this. I've been trying to embed Flume within the Log4j 2 client and have found that things that look like they are extendable actually aren't and that Flume configuration is actually pretty brittle (the properties don't seem to be isolated to one place but percolate through several), such that if you want to create a configuration using XML, JSON or anything else you are forced to convert that into the property syntax. I've also noticed that the Sources, Sinks and Channels are all defined in enums - another brittle construct that puts third party add-ons at a disadvantage from stuff packaged with Flume. I tackled a lot of these issues in Log4j 2 and came up with different solutions for them. I used annotations, but not specifically to mark things as public or private but to identify "plugins". For example, Sources, Sinks and Channels could all be annotated and be made available by a short name specified on the annotation which would get rid of the need for the enums. Of course, these also identify components that can be used as models for developers and users to emulate to add their own components. While addition annotations as guidance to programmers is OK, you can accomplish the same thing just by writing good Javadoc. I'm more a fan of using annotations for things that are a bit more useful. Ralph On Aug 14, 2012, at 2:38 AM, Mike Percy wrote: > It seems we have reached a point in some of the Flume components where we > want to add features but that means adding new interfaces to maintain > backwards compatibility. While this is natural, the more we do it, and the > more we cast from interface to interface, the less readable and consistent > the codebase becomes. Also, we have exposed much of our code as public > class + public method, even when APIs may not be intended as stable > extension points, for testing or other reasons. A few years ago, Hadoop > faced this problem and ended up implementing annotations to document APIs > as @Stable/@Evolving, @Public/@Limited/@Private. See < > https://issues.apache.org/jira/browse/HADOOP-5073> for the history on that. > > I would like to propose the adoption of a similar mechanism in Flume, in > order to give us more wiggle room in the future for evolutionary > development. Thoughts? > > Right now, I feel we would get most of the bang for the buck simply by > adding two annotations: @Public and @Internal, which to me means "you can > subclass or instantiate this directly", or "you can't directly use this if > you expect future compatibility". > > Regards, > Mike
-
Re: [DISCUSS] Annotating interface scope/stability in Flume
Will McQueen 2012-08-14, 16:47
Something to consider: Eclipse uses the string 'internal' in their package path to denote packages (public or otherwise) that are not intended to be used as a public API: "All packages that are part of the platform implementation but contain no API that should be exposed to ISVs are considered internal implementation packages. All implementation packages should be flagged as internal, with the tag occurring just after the major package name. ISVs will be told that all packages marked internal are out of bounds. (A simple text search for ".internal." detects suspicious reference in source files; likewise, "/internal/" is suspicious in .class files). " [Ref: http://wiki.eclipse.org/Naming_Conventions#Internal_Implementation_Packages]Here are some additional links on evolving Java APIs: http://wiki.eclipse.org/Evolving_Java-based_APIshttp://wiki.eclipse.org/Evolving_Java-based_APIs_2http://wiki.eclipse.org/Evolving_Java-based_APIs_3>> Flume configuration is actually pretty brittle FLUME-1051 might address this concern. >>if you want to create a configuration using XML, JSON Flume is currently hardcoded to read from a Java Properties file. Please see Application.run(): AbstractFileConfigurationProvider configurationProvider = new PropertiesFileConfigurationProvider(); Cheers, Will On Tue, Aug 14, 2012 at 9:14 AM, Ralph Goers <[EMAIL PROTECTED]>wrote: > I have a slightly different take on this. > > I've been trying to embed Flume within the Log4j 2 client and have found > that things that look like they are extendable actually aren't and that > Flume configuration is actually pretty brittle (the properties don't seem > to be isolated to one place but percolate through several), such that if > you want to create a configuration using XML, JSON or anything else you are > forced to convert that into the property syntax. I've also noticed that > the Sources, Sinks and Channels are all defined in enums - another brittle > construct that puts third party add-ons at a disadvantage from stuff > packaged with Flume. > > I tackled a lot of these issues in Log4j 2 and came up with different > solutions for them. I used annotations, but not specifically to mark > things as public or private but to identify "plugins". For example, > Sources, Sinks and Channels could all be annotated and be made available by > a short name specified on the annotation which would get rid of the need > for the enums. Of course, these also identify components that can be used > as models for developers and users to emulate to add their own components. > > While addition annotations as guidance to programmers is OK, you can > accomplish the same thing just by writing good Javadoc. I'm more a fan of > using annotations for things that are a bit more useful. > > Ralph > > > On Aug 14, 2012, at 2:38 AM, Mike Percy wrote: > > > It seems we have reached a point in some of the Flume components where we > > want to add features but that means adding new interfaces to maintain > > backwards compatibility. While this is natural, the more we do it, and > the > > more we cast from interface to interface, the less readable and > consistent > > the codebase becomes. Also, we have exposed much of our code as public > > class + public method, even when APIs may not be intended as stable > > extension points, for testing or other reasons. A few years ago, Hadoop > > faced this problem and ended up implementing annotations to document APIs > > as @Stable/@Evolving, @Public/@Limited/@Private. See < > > https://issues.apache.org/jira/browse/HADOOP-5073> for the history on > that. > > > > I would like to propose the adoption of a similar mechanism in Flume, in > > order to give us more wiggle room in the future for evolutionary > > development. Thoughts? > > > > Right now, I feel we would get most of the bang for the buck simply by > > adding two annotations: @Public and @Internal, which to me means "you can > > subclass or instantiate this directly", or "you can't directly use this
-
Re: [DISCUSS] Annotating interface scope/stability in Flume
Arun C Murthy 2012-08-14, 17:15
On Aug 14, 2012, at 2:38 AM, Mike Percy wrote:
> Right now, I feel we would get most of the bang for the buck simply by > adding two annotations: @Public and @Internal, which to me means "you can > subclass or instantiate this directly", or "you can't directly use this if > you expect future compatibility".
Why not keep the same nomenclature as in Hadoop? It will help keeping some sort of commonality in the ecosystem?
Arun
-
Re: [DISCUSS] Annotating interface scope/stability in Flume
Ralph Goers 2012-08-14, 17:25
On Aug 14, 2012, at 9:47 AM, Will McQueen wrote:
> > >>> if you want to create a configuration using XML, JSON > Flume is currently hardcoded to read from a Java Properties file. Please > see Application.run(): > AbstractFileConfigurationProvider configurationProvider = new > PropertiesFileConfigurationProvider(); >
Yes, I know. That is only part of it. FlumeConfiguration also expects properties. PropertiesFileConfigurationProvider cannot be extended, even though it sort of looks like it can. All the useful methods are private.
It turns out I can't really leverage the way Flume startup works anyway. When the configuration changes it basically does a restart. That works fine in a standalone case since clients will fail over to another agent until reconfiguration is complete. But an embedded agent can't shutdown like that so I'm pretty much having to roll my own.
Ralph
-
Re: [DISCUSS] Annotating interface scope/stability in Flume
Ralph Goers 2012-08-14, 17:29
On Aug 14, 2012, at 9:47 AM, Will McQueen wrote: > Something to consider: Eclipse uses the string 'internal' in their package > path to denote packages (public or otherwise) that are not intended to be > used as a public API: > > "All packages that are part of the platform implementation but contain no > API that should be exposed to ISVs are considered internal implementation > packages. All implementation packages should be flagged as internal, with > the tag occurring just after the major package name. ISVs will be told that > all packages marked internal are out of bounds. (A simple text search for > ".internal." detects suspicious reference in source files; likewise, > "/internal/" is suspicious in .class files). " FWIW, I also think this is a good idea. > [Ref: > http://wiki.eclipse.org/Naming_Conventions#Internal_Implementation_Packages]> > Here are some additional links on evolving Java APIs: > http://wiki.eclipse.org/Evolving_Java-based_APIs> http://wiki.eclipse.org/Evolving_Java-based_APIs_2> http://wiki.eclipse.org/Evolving_Java-based_APIs_3> >>> Flume configuration is actually pretty brittle > FLUME-1051 might address this concern. > I'm not sure how. That issues seems more about guaranteeing validity than making the configuration provider more flexible. Ralph
-
Re: [DISCUSS] Annotating interface scope/stability in Flume
Patrick Wendell 2012-08-14, 19:38
We definitely need something like this, given that extensability is the expectation in Flume (not the exception). It's likely that the flume developer community will overlap with Hadoop developer community, so using a subset of the Hadoop annotations makes sense. @Private and @Public seem like a good fit for what we need right now. - Patrick On Tue, Aug 14, 2012 at 10:29 AM, Ralph Goers <[EMAIL PROTECTED]> wrote: > > On Aug 14, 2012, at 9:47 AM, Will McQueen wrote: > >> Something to consider: Eclipse uses the string 'internal' in their package >> path to denote packages (public or otherwise) that are not intended to be >> used as a public API: >> >> "All packages that are part of the platform implementation but contain no >> API that should be exposed to ISVs are considered internal implementation >> packages. All implementation packages should be flagged as internal, with >> the tag occurring just after the major package name. ISVs will be told that >> all packages marked internal are out of bounds. (A simple text search for >> ".internal." detects suspicious reference in source files; likewise, >> "/internal/" is suspicious in .class files). " > > FWIW, I also think this is a good idea. > > >> [Ref: >> http://wiki.eclipse.org/Naming_Conventions#Internal_Implementation_Packages]>> >> Here are some additional links on evolving Java APIs: >> http://wiki.eclipse.org/Evolving_Java-based_APIs>> http://wiki.eclipse.org/Evolving_Java-based_APIs_2>> http://wiki.eclipse.org/Evolving_Java-based_APIs_3>> >>>> Flume configuration is actually pretty brittle >> FLUME-1051 might address this concern. >> > I'm not sure how. That issues seems more about guaranteeing validity than making the configuration provider more flexible. > > > Ralph >
-
Re: [DISCUSS] Annotating interface scope/stability in Flume
Mike Percy 2012-08-15, 07:20
Yeah, it sounds reasonable to me to use a subset of the same annotations that Hadoop uses, such as @Public and @Private. As far as using ".internal." package names to denote non-public classes, that works for new classes but I don't think it really works for existing classes, which makes adopting it harder. Regards, Mike On Tue, Aug 14, 2012 at 12:38 PM, Patrick Wendell <[EMAIL PROTECTED]>wrote: > We definitely need something like this, given that extensability is > the expectation in Flume (not the exception). > > It's likely that the flume developer community will overlap with > Hadoop developer community, so using a subset of the Hadoop > annotations makes sense. @Private and @Public seem like a good fit for > what we need right now. > > - Patrick > > On Tue, Aug 14, 2012 at 10:29 AM, Ralph Goers > <[EMAIL PROTECTED]> wrote: > > > > On Aug 14, 2012, at 9:47 AM, Will McQueen wrote: > > > >> Something to consider: Eclipse uses the string 'internal' in their > package > >> path to denote packages (public or otherwise) that are not intended to > be > >> used as a public API: > >> > >> "All packages that are part of the platform implementation but contain > no > >> API that should be exposed to ISVs are considered internal > implementation > >> packages. All implementation packages should be flagged as internal, > with > >> the tag occurring just after the major package name. ISVs will be told > that > >> all packages marked internal are out of bounds. (A simple text search > for > >> ".internal." detects suspicious reference in source files; likewise, > >> "/internal/" is suspicious in .class files). " > > > > FWIW, I also think this is a good idea. > > > > > >> [Ref: > >> > http://wiki.eclipse.org/Naming_Conventions#Internal_Implementation_Packages> ] > >> > >> Here are some additional links on evolving Java APIs: > >> http://wiki.eclipse.org/Evolving_Java-based_APIs> >> http://wiki.eclipse.org/Evolving_Java-based_APIs_2> >> http://wiki.eclipse.org/Evolving_Java-based_APIs_3> >> > >>>> Flume configuration is actually pretty brittle > >> FLUME-1051 might address this concern. > >> > > I'm not sure how. That issues seems more about guaranteeing validity > than making the configuration provider more flexible. > > > > > > Ralph > > >
-
Re: [DISCUSS] Annotating interface scope/stability in Flume
Brock Noland 2012-08-15, 07:28
Agreed it may not work for all existing classes we would like to make internal. But it would work for things like FileChannel and MemoryChannel, no? Brock On Wed, Aug 15, 2012 at 8:20 AM, Mike Percy <[EMAIL PROTECTED]> wrote: > Yeah, it sounds reasonable to me to use a subset of the same annotations > that Hadoop uses, such as @Public and @Private. > > As far as using ".internal." package names to denote non-public classes, > that works for new classes but I don't think it really works for existing > classes, which makes adopting it harder. > > Regards, > Mike > > On Tue, Aug 14, 2012 at 12:38 PM, Patrick Wendell <[EMAIL PROTECTED] > >wrote: > > > We definitely need something like this, given that extensability is > > the expectation in Flume (not the exception). > > > > It's likely that the flume developer community will overlap with > > Hadoop developer community, so using a subset of the Hadoop > > annotations makes sense. @Private and @Public seem like a good fit for > > what we need right now. > > > > - Patrick > > > > On Tue, Aug 14, 2012 at 10:29 AM, Ralph Goers > > <[EMAIL PROTECTED]> wrote: > > > > > > On Aug 14, 2012, at 9:47 AM, Will McQueen wrote: > > > > > >> Something to consider: Eclipse uses the string 'internal' in their > > package > > >> path to denote packages (public or otherwise) that are not intended to > > be > > >> used as a public API: > > >> > > >> "All packages that are part of the platform implementation but contain > > no > > >> API that should be exposed to ISVs are considered internal > > implementation > > >> packages. All implementation packages should be flagged as internal, > > with > > >> the tag occurring just after the major package name. ISVs will be told > > that > > >> all packages marked internal are out of bounds. (A simple text search > > for > > >> ".internal." detects suspicious reference in source files; likewise, > > >> "/internal/" is suspicious in .class files). " > > > > > > FWIW, I also think this is a good idea. > > > > > > > > >> [Ref: > > >> > > > http://wiki.eclipse.org/Naming_Conventions#Internal_Implementation_Packages> > ] > > >> > > >> Here are some additional links on evolving Java APIs: > > >> http://wiki.eclipse.org/Evolving_Java-based_APIs> > >> http://wiki.eclipse.org/Evolving_Java-based_APIs_2> > >> http://wiki.eclipse.org/Evolving_Java-based_APIs_3> > >> > > >>>> Flume configuration is actually pretty brittle > > >> FLUME-1051 might address this concern. > > >> > > > I'm not sure how. That issues seems more about guaranteeing validity > > than making the configuration provider more flexible. > > > > > > > > > Ralph > > > > > > -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
-
Re: [DISCUSS] Annotating interface scope/stability in Flume
Arun C Murthy 2012-08-15, 07:29
On Aug 15, 2012, at 12:20 AM, Mike Percy wrote: > Yeah, it sounds reasonable to me to use a subset of the same annotations > that Hadoop uses, such as @Public and @Private. > You guys might also want to consider the Hadoop 'Stability' notation: Stable, Evolving, Unstable. FWIW, Stability is different from Audience (Public/Private/LimitedPrivate). This is derived from Sun etc., see HADOOP-5073 for more context. Arun > As far as using ".internal." package names to denote non-public classes, > that works for new classes but I don't think it really works for existing > classes, which makes adopting it harder. > > Regards, > Mike > > On Tue, Aug 14, 2012 at 12:38 PM, Patrick Wendell <[EMAIL PROTECTED]>wrote: > >> We definitely need something like this, given that extensability is >> the expectation in Flume (not the exception). >> >> It's likely that the flume developer community will overlap with >> Hadoop developer community, so using a subset of the Hadoop >> annotations makes sense. @Private and @Public seem like a good fit for >> what we need right now. >> >> - Patrick >> >> On Tue, Aug 14, 2012 at 10:29 AM, Ralph Goers >> <[EMAIL PROTECTED]> wrote: >>> >>> On Aug 14, 2012, at 9:47 AM, Will McQueen wrote: >>> >>>> Something to consider: Eclipse uses the string 'internal' in their >> package >>>> path to denote packages (public or otherwise) that are not intended to >> be >>>> used as a public API: >>>> >>>> "All packages that are part of the platform implementation but contain >> no >>>> API that should be exposed to ISVs are considered internal >> implementation >>>> packages. All implementation packages should be flagged as internal, >> with >>>> the tag occurring just after the major package name. ISVs will be told >> that >>>> all packages marked internal are out of bounds. (A simple text search >> for >>>> ".internal." detects suspicious reference in source files; likewise, >>>> "/internal/" is suspicious in .class files). " >>> >>> FWIW, I also think this is a good idea. >>> >>> >>>> [Ref: >>>> >> http://wiki.eclipse.org/Naming_Conventions#Internal_Implementation_Packages>> ] >>>> >>>> Here are some additional links on evolving Java APIs: >>>> http://wiki.eclipse.org/Evolving_Java-based_APIs>>>> http://wiki.eclipse.org/Evolving_Java-based_APIs_2>>>> http://wiki.eclipse.org/Evolving_Java-based_APIs_3>>>> >>>>>> Flume configuration is actually pretty brittle >>>> FLUME-1051 might address this concern. >>>> >>> I'm not sure how. That issues seems more about guaranteeing validity >> than making the configuration provider more flexible. >>> >>> >>> Ralph >>> >> -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
-
Re: [DISCUSS] Annotating interface scope/stability in Flume
Brock Noland 2012-08-15, 07:30
Ignore this comment. I guess if we are going to use the annotations there is not need to use "internal" as well. On Wed, Aug 15, 2012 at 8:28 AM, Brock Noland <[EMAIL PROTECTED]> wrote: > Agreed it may not work for all existing classes we would like to make > internal. But it would work for things like FileChannel and MemoryChannel, > no? > > Brock > > > On Wed, Aug 15, 2012 at 8:20 AM, Mike Percy <[EMAIL PROTECTED]> wrote: > >> Yeah, it sounds reasonable to me to use a subset of the same annotations >> that Hadoop uses, such as @Public and @Private. >> >> As far as using ".internal." package names to denote non-public classes, >> that works for new classes but I don't think it really works for existing >> classes, which makes adopting it harder. >> >> Regards, >> Mike >> >> On Tue, Aug 14, 2012 at 12:38 PM, Patrick Wendell <[EMAIL PROTECTED] >> >wrote: >> >> > We definitely need something like this, given that extensability is >> > the expectation in Flume (not the exception). >> > >> > It's likely that the flume developer community will overlap with >> > Hadoop developer community, so using a subset of the Hadoop >> > annotations makes sense. @Private and @Public seem like a good fit for >> > what we need right now. >> > >> > - Patrick >> > >> > On Tue, Aug 14, 2012 at 10:29 AM, Ralph Goers >> > <[EMAIL PROTECTED]> wrote: >> > > >> > > On Aug 14, 2012, at 9:47 AM, Will McQueen wrote: >> > > >> > >> Something to consider: Eclipse uses the string 'internal' in their >> > package >> > >> path to denote packages (public or otherwise) that are not intended >> to >> > be >> > >> used as a public API: >> > >> >> > >> "All packages that are part of the platform implementation but >> contain >> > no >> > >> API that should be exposed to ISVs are considered internal >> > implementation >> > >> packages. All implementation packages should be flagged as internal, >> > with >> > >> the tag occurring just after the major package name. ISVs will be >> told >> > that >> > >> all packages marked internal are out of bounds. (A simple text search >> > for >> > >> ".internal." detects suspicious reference in source files; likewise, >> > >> "/internal/" is suspicious in .class files). " >> > > >> > > FWIW, I also think this is a good idea. >> > > >> > > >> > >> [Ref: >> > >> >> > >> http://wiki.eclipse.org/Naming_Conventions#Internal_Implementation_Packages>> > ] >> > >> >> > >> Here are some additional links on evolving Java APIs: >> > >> http://wiki.eclipse.org/Evolving_Java-based_APIs>> > >> http://wiki.eclipse.org/Evolving_Java-based_APIs_2>> > >> http://wiki.eclipse.org/Evolving_Java-based_APIs_3>> > >> >> > >>>> Flume configuration is actually pretty brittle >> > >> FLUME-1051 might address this concern. >> > >> >> > > I'm not sure how. That issues seems more about guaranteeing validity >> > than making the configuration provider more flexible. >> > > >> > > >> > > Ralph >> > > >> > >> > > > > -- > Apache MRUnit - Unit testing MapReduce - > http://incubator.apache.org/mrunit/> -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
|
|