Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> [DISCUSS] API changes to provide resource cleanup


Copy link to this message
-
Re: [DISCUSS] API changes to provide resource cleanup
All of our current code treats the Instance like a simple record:

* immutable, and therefore
* thread-safe
* provides several fields that describe an instance

When I tried to add calls to close() in our own code, I found that our
disregard for the lifetime of an instance was implicit, and probably is in
all our user's code, too.

I think if we want to do something like #1, we'll have to do so through a
new API, and not by changing Instance, and then deprecate Instance.  The
mental model is just completely different.

-Eric
On Thu, Jan 2, 2014 at 12:47 PM, Sean Busbey <busbey+[EMAIL PROTECTED]>wrote:

> Hey Folks!
>
> We need to come to some conclusions on what we're going to do for resource
> clean up. I'll attempt to summarize the situation and various options. If I
> missed something from our myriad of tickets and mailing list threads,
> please bring it up.
>
> Brief Background:
>
> The existing client APIs presume that a large amount of global state will
> persist for the duration of a JVM instance. This is at odds with lifecycle
> management in application containers, where a JVM is very long lived and
> user provided applications are stood up and torn down. We have reports of
> this causing OOM on JBoss[1] and leaked threads on Tomcat[2].
>
> We have two possible solutions, both of which Jared Winick has kindly
> verified solve the problem, as seen on JBoss.
>
> ----
> = Proposed solution #1: Closeable Instance
>
> The first approach adds a .close method to Instance so that users can say
> when they are done with a given instance. Internally, reference counting
> determines when we tear down global resources.
>
> Advantages:
>   * States via code where a client should do lifecycle management.
>   * Allows shutting down just some of the resources used.
>   * Is already in the code base.
>
> Disadvantages:
>   * Since lifecycle is getting added post-hoc, we are more likely to have
> maintenance issues as we find other side effects we hadn't considered, like
> the multithreaded issue that already came up[3].
>   * Changes Instance from representing static configuration to shared state
>   * Doesn't work with the fluent style some of our APIs encourage.
>   * closed semantics probably aren't consistently enforced (e.g. users
> trying to use a BatchWriter that came from a now-closed instance should
> fail)
>
> To finish, we'd need to
>   * Verify multithreaded handling is done without too much of a performance
> impact[3]
>   * Finish making our internal use consistent with the lifecycle we're
> telling others to use[4]
>   * Possibly add tests to verify consistent enforcement of closing on
> objects derived from Instance
>
> = Proposed solution #2: Global cleanup utility, aka The Hammer
>
> As a band-aid to allow for "unload resources" without making changes to the
> API we instead provide a utility method that cleans up all global
> resources.
>
> Advantages:
>   * Doesn't change API or meaning for Instance
>   * Can be used on older Accumulo deployments w/o patch/rebuild cycle
>
> Disadvantages:
>   * Only allows all-or-nothing cleanup
>   * Doesn't address our underlying lack of lifecycle
>   * Requires reverts
>
> To finish, we'd need to
>   * revert commits from old solution (I haven't checked how many commits,
> but it's 6 tickets :/ )
>   * port code from PoC to main codebase (asf grants, etc) [6]
>   * add some kind of test (functional/IT?)
>
> -----
>
> We need to decide what we're going to provide as a placeholder for releases
> already frozen on API (i.e. 1.4, 1.5, 1.6*) as well as longer term.
>
> Personally, my position is that we should use the simplest change to handle
> the published versions (solution #2).
>
> Obviously there are outstanding issues with how we deal with global state
> and shared resources in the current client APIs. I'd like to see that
> addressed as a part of a more coherent client lifecycle rather than
> struggling to make it work while maintaining the current API. Long term, I
> think this means handling things in the updated client API Christopher has