ACCUMULO-2143 has developed a conversation about MiniAccumuloCluster's intended use and the way we currently implement the difference between MAC for external use and MAC for internal Accumulo testing.
In particular, Josh had a few major concerns It doesn't make sense to me why MiniAccumuloCluster is a concrete class which, ultimately still tied to a MiniAccumuloClusterImpl. MiniAccumuloCluster *requires* a MiniAccumuloClusterImpl or something that extends it. This is what's really chafing me about the separation of "accumulo user" and "accumulo developer" methods – you *always* have them both. Not to mention, this hierarchy is really obnoxious to create a new implementation of AccumuloMiniCluster(Impl) because I have to carry all of the cruft of the "original" implementation with me.
Bringing this back around to this ticket, while I still don't agree with the reasoning that exposing the FileSystem or ZooKeeper object that MiniAccumuloClusterImpl is getting us anything other than the ability to say "we didn't change this [arbitrary] API". For "users" who might not care what the underlying FileSystem or ZooKeeper connection, it's merely an extra two items in their editor's code-completion. For "users" who would care to use this information, we now make them jump through extra hoops to get it. That just doesn't make any sense to me for something we haven't even released.
To be honest, I really want to re-open ACCUMULO-2151<https://issues.apache.org/jira/browse/ACCUMULO-2151>, make MiniAccumuloCluster an interface, MiniAccumuloClusterImpl an implementation of said interface, and create some factory class to make instances, ala Connector.tableOperations, Connector.securityOperations, etc. Right now there's a class we call an "API" that cannot be generically extended for the sake of saying "we have an API". I wanted to avoid having a drawn out discussion on a jira, where folks my not notice it. Especially with things being late in 1.6.0 development and the potential this has to impact the API.
Personally, I don't have much of a dog in the fight. There's always some arbitrary line for where the public API will be, presuming we want to have any kind of balance between providing a stable based for others to build on and being able to refactor things. I would like us to hold to our API promises and I would rather we not leak implementation details unnecessarily.
I suspect the choice to make MiniAccumuloCluster a class rather than an interface with a factory was driven by the restrictions we put on API changes between major versions and the fact that 1.5 had a class you could instantiate via constructors.
On Wed, Mar 26, 2014 at 10:38 AM, Josh Elser <[EMAIL PROTECTED]> wrote:
We already declared minicluster public in 1.5.0 and 1.5.1, so we're a bit bound as is. We don't clarify in our docs if we maintain source compatibility or binary compatibility. I would caution that users tend to presume both (and maybe binary a little more), based on the continued dismay that happens around the Hadoop 1 -> Hadoop 2 Class/Interface stuff.
There were many change made to MAC so Accumulo could test itself. For example a method was added to return the internal threads that flush logs. Flushing the logs may be a useful feature to add. However it could be offered in a way that does not expose these internal threads. When working on ACCUMULO-2151 I had no desire to reimplement things like this, I just wanted to hide it. It was hidden from users so we do not have to support it and can change it at will when testing 1.7.0.
As Sean said MAC was a class in 1.4.4, 1.5.0, and 1.5.1. So making it an interface would break things for any users using it. Any reorganizing of the implementation of MAC could easily be done after 1.6.0. From a users perspective the MAC API has changed very little, even though the implementation has dramatically changed. On Wed, Mar 26, 2014 at 3:10 AM, Sean Busbey <busbey+[EMAIL PROTECTED]>wrote:
On Wed, Mar 26, 2014 at 11:06 AM, Keith Turner <[EMAIL PROTECTED]> wrote: One minor point of clarification: while the MAC was backported to 1.4.4+ it was not made a part of the public API in those releases.
I have been presuming this was an intentional compromise position. I like the current situation and would prefer we have major additions like the MAC released for a version before adding them to the public api (even though this specific case was a retrofit). Still, if our intention is for MAC in 1.4 to be a part of the support api, I'll need to file a ticket and fix it.
On Wed, Mar 26, 2014 at 11:26 AM, Josh Elser <[EMAIL PROTECTED]> wrote:
I would not want this added to the public API.
IMO, ACCUMULO-2151 has two key goals
1) keep the public API simple and well-defined for the use of people testing against the rest of our public API
2) allowing us to leverage the codebase for our internal testing of Accumulo.
expanding the former to help the later doesn't get us anywhere. It just binds up what we can change in the thing we use for testing because we exposed it to the class of users who rely on us to have a boundary for slowing down changes.
On Wed, Mar 26, 2014 at 12:26 PM, Josh Elser <[EMAIL PROTECTED]> wrote:
Why not? It needlessly exposes a MAC implementation detail. Java 7 offers a much better way to handle this situation and makes the need for these threads go away. As I said flushing the logs could be offered in the API in a much nicer way. Thats one solution.
Does MiniAccumuloC*Impl leak from the minicluster package in some way?
On Wed, Mar 26, 2014 at 11:17 AM, Keith Turner <[EMAIL PROTECTED]> wrote:
That helps, but still breaks binary compatibility. The class files compiled against the original version will still throws an IncompatibleClassChangeError when they attempt to get back instances from the factory.
On Wed, Mar 26, 2014 at 11:46 AM, Josh Elser <[EMAIL PROTECTED]> wrote:
AFAICT, it's used in internal tests (that are in a different package) to make sure things have been flushed to disk before verifying internal state (because checking that state as files in HDFS is simpler then walking in-memory representations)
[NOTE: I started this email when this thread was new, and it kinda of blew up on me while writing it and being distracted. Apologies in advance if things were already covered or it's not relevant any more.]
Is this a design quality discussion or a a functionality discussion?
The changes from 1.5->1.6 seem like a poor design decision, but they do aid in functionality.
From 1.5: public MiniAccumuloCluster(File dir, String rootPassword) throws IOException public MiniAccumuloCluster(MiniAccumuloConfig config) throws IOException public void start() throws IOException, InterruptedException public String getInstanceName() public String getZooKeepers() public void stop() throws IOException, InterruptedException
From 1.6: public MiniAccumuloCluster(File dir, String rootPassword) throws IOException public MiniAccumuloCluster(MiniAccumuloConfig config) throws IOException public void start() throws IOException, InterruptedException public Set<Pair<ServerType,Integer>> getDebugPorts() public String getInstanceName() public String getZooKeepers() public void stop() throws IOException, InterruptedException public MiniAccumuloConfig getConfig() public Connector getConnector(String user, String passwd) throws AccumuloException, AccumuloSecurityException public ClientConfiguration getClientConfig()
From a client perspective, I see a difference of #getDebugPorts, #getConfig, #getConnector, #getClientConfig. The other methods are on the Impl. There's nothing wrong with using aggregation in this case, since the code would be the same regardless.
I don't quite understand what it means to "extend generically." At this point, the MiniAccumuloCluster's interface of the MiniAccumuloClusterImpl's interface. The naming could, and should, be better, but I don't quite get where we're losing functionality.
On Wed, Mar 26, 2014 at 12:06 PM, Keith Turner <[EMAIL PROTECTED]> wrote:
On Wed, Mar 26, 2014 at 12:44 PM, Sean Busbey <[EMAIL PROTECTED]> wrote: NM then. I thought we did something like this w/ Connector, but I see now its an abstract class. So it was changed from a concrete class to an abstract class.
On Wed, Mar 26, 2014 at 12:46 PM, Josh Elser <[EMAIL PROTECTED]> wrote: Can you give an example of what you are thinking of? I don't understand you viewpoint either
As long as we preserve the correct signatures on the public methods in o.a.a.minicluster, we can make any changes we like to the implementation of those methods. The implementation of those methods happen to use classes in o.a.a.minicluster.impl. No classes from o.a.a.minicluster.impl should leak through the public methods in o.a.a.minicluster. I was asking if there was leakage there.
I think it would be best if you could throw a different MACConfig at it to have it vary in behavior, rather then different implementations. I'd like to think that this would provide the most backward compatibility and ease of use, but I could be mistaken. On Wed, Mar 26, 2014 at 2:05 PM, Josh Elser <[EMAIL PROTECTED]> wrote:
What you are trying to do sounds interesting. It also sounds experimental and in the early stages. Is there anything specific you think should be done for 1.6.0 w/ regards to MAC API? On Wed, Mar 26, 2014 at 2:26 PM, Josh Elser <[EMAIL PROTECTED]> wrote:
I've been watching the conversation on the side, but I wanted to mention that it seems the focus isn't so much on "mini" clusters anymore. You're thinking of programmatic cluster management, whether one node or many. The idea of a basic cluster management interface, with MAC as an implementation, is promising. A package name of just "cluster" could work.
Carry on :)
Bill H On Fri, Mar 28, 2014 at 12:39 AM, Sean Busbey <busbey+[EMAIL PROTECTED]>wrote: // Bill Havanki // Solutions Architect, Cloudera Govt Solutions // 443.686.9283
I don't think any of this should be done for 1.6.0, but I like the idea of creating a separate cluster interface for testing. I think it should be integrated into the accumulo-maven-plugin, also. I think the idea should be hammered out, and tested as a separate thing, to experiment with the options, and provided as a complete feature for the next major release. If it would change packaging dependencies, it shouldn't even be done for 1.6.x bugfix releases.
Not even the addition of a new interface, Christopher? I'd very much like to have an interface that we can get in 1.6.0 at a minimum. I wouldn't even push for any deprecation of what's currently in place. On Mar 28, 2014 10:02 AM, "Christopher" <[EMAIL PROTECTED]> wrote:
On Fri, Mar 28, 2014 at 3:14 PM, Josh Elser <[EMAIL PROTECTED]> wrote: W/o deprecation it seems very confusing. The intent is that users should use the new one, but the old one is not deprecated. If someone is completely new to this, how will they know which option to use?
Once you get down in the weeds of working on this, do you think you might end wanting something very different?
Forgot to also add, that I would add the experimental annotation to alleviate confusion.
The already mocked minimum set of methods on the interface which I posted to github Is a first pass. If we miss something that is in fact common, we can add it later, anything else is likely destined for the implementation.
On Friday, March 28, 2014, Keith Turner <[EMAIL PROTECTED]> wrote:
I think this is better reserved for a version later than 1.6.0. It's an 11th hour change in addition to being a large overhaul of the interfaces to support functionality we never intended for 1.6.0. On Fri, Mar 28, 2014 at 4:04 PM, Josh Elser <[EMAIL PROTECTED]> wrote:
But... without more time to fully develop the requirements for the interface, with a few implementations, it's probably going to change anyway. I think even adding the interface could complicate the follow-on work. But... *shrug*.... maybe you can have guarantees that the interface will stay as is (same package, same methods, same name, etc.)?
I'll have something integrated to 1.6.0 this weekend. Something concrete may help to firm everyone's opinion. On Mar 28, 2014 5:54 PM, "Christopher" <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext