Testing: Unit test and auto-tests passed successfully. Ran a short (~2hrs) CI on 6 node installation. Ran a brief (~1hr) CI test on one machine with the newly-released Hadoop-2.3.0. Built from src tarball, and verified functionality with bin tarball.
Since there are very minor changes compared to 1.5.1-RC2, this vote will be open for the next 72 hours (2/28/2014 0100 UTC).
Upon successful completion of this vote, a 1.5.1 gpg-signed Git tag will be created from 3478f71a and the above staging repository will be promoted.
I ran a utility  to analyze API diffs  between 1.5.0 and 1.5.1-RC3. The configs I used are the two xml files in the parent  of the report. I think the diff looks ok. I used jars from 1.5.0 and 1.5.1-RC3 bin.tar.gz.
Thanks for running that checker, Keith. Should we not be worried about the removal of InputFormatBase.RangeInputSplit? If I read correctly this will break both binary (runtime) compatibility and code (compile-time) compatibility. Can somebody make an argument for why this is not a problem in a minor release with no previous deprecation?
Is there a quick way to fix this, like by subclassing the org.apache.accumulo.core.client.mapred.RangeInputSplit in a o.a.a.c.c.mapred.InputFormatBase.RangeInputSplit that we mark as deprecated?
Adam On Tue, Feb 25, 2014 at 5:17 PM, Keith Turner <[EMAIL PROTECTED]> wrote:
I don't know that this inner class used for M/R should be considered public API... nor do I imagine it will cause compatibility problems if users aren't referencing it in their code (which there's no reason to expect them to). I don't know if anybody is subclassing RangeInputSplit, but I'd suspect that it's an acceptable risk. Re-adding an inner class that subclasses the now external one may be a good workaround. I don't think this would require recompilation for runtime compatibility, but if it does, I think that's probably acceptable.
I've run my verifier script  several times, on CentOS 6.5, Fedora 19, Fedora 20 (at least half a dozen now, and at least twice on each), to verify the following:
All signatures and hashes are good, including the RPM sigs. Jars have sources/javadocs, are sealed, and match the binary tarball contents. Source tarball matches the git tag (3478f71a), and builds with and without tests with the configured hadoop.profile=1 and hadoop.profile=2.
Some of the integration tests time out occasionally on a CentOS 6.5 build in a VM, but I'm not too concerned about the time-sensitivity in those tests.
Did the following tests on CentOS 6.5, Hadoop 1.2.1, ZK 3.4.5, (3GB-standalone-native), OpenJDK 1.6 (java-1.6.0-openjdk-220.127.116.11-18.104.22.168.el6_5.x86_64): Started/stopped a single-node cluster, created/deleted tables, ran test/verify ingest (test/system/test1) Write performance (5 threads * ~40k mut/sec = ~200k mut/sec) Read performance (5 threads * ~109k rec/sec = ~545k rec/sec)
I did run into a strange bug where I couldn't start the master... the java process would just die unexpectedly while setting up log4j, without logging anything to stdout or stderr. The problem was fixed after a reboot, so I wasn't able to reproduce or debug it, and I don't think it's an Accumulo bug (probably kernel or openjdk bug).
On Thu, Feb 27, 2014 at 6:16 PM, Christopher <[EMAIL PROTECTED]> wrote: I am seeing slow write speeds when running test/system/test1/ingest_test.sh w/ a single node Hadoop 2.2.0. I have confirmed this is ACCUMULO-1905, when I increase tserver.mutation.queue.max to 4M, write rates are around 200k mut/sec. When tserver.mutation.queue.max is at the default value, it writes at around 1/5th the speed. I added some notes to 1905.
I have not voted because of the Hadoop 2.2.0 performance issue. I spoke w/ Christopher about this he suggested that if we had release notes this would not be a problem. So it seems like our options for 1.5.1-RC3 and 1905 are the following.
1. Ingore it 2. Create release notes that prominently mentions the Hadoop 2.2.0 performance issue and workarounds 3. Create a 1.5.1-RC4 w/ different a different default for tserver.mutation.queue.max
I don't mind creating the release notes. I am still puzzling over what the best option is for the long term.
On Mon, Feb 24, 2014 at 8:01 PM, Josh Elser <[EMAIL PROTECTED]> wrote:
I was actually planning to write up some release notes after the fact (probably throw them up by the download link).
I am on my phone, but I thought the mutation queue size affected 1.5 and greater (read as hdfs wal), not just hadoop 2.2.0. My memory could also just be failing me. On Feb 27, 2014 7:38 PM, "Keith Turner" <[EMAIL PROTECTED]> wrote:
Looks like hsync (HDFS-744) has a fix version of 2.0.2. There was some talk about getting that in the 1.0 line but I don't see a resolution for that. I'm also not sure how sync was done (if it all reliably?) in the hadoop 1 line. On Feb 27, 2014 8:28 PM, [EMAIL PROTECTED] wrote:
On Thu, Feb 27, 2014 at 8:29 PM, Josh Elser <[EMAIL PROTECTED]> wrote: Ok, I thnik thats the way to go. I was thinking back on why I did not change the default in 1.5 when I changed it for 1.6. I think I was concerned about destabilizing a Hadoop 1+Accumulo 1.5.0 after upgrade. I think if we changed the default we would still need release notes warning people about the possibility of additional mem usage after upgrade.
One thing I would like to do but have not had time is to create an Accumulo maven project that depends on 1.5.1, configure maven to use the staging repo, and build and run the test. Has anyone else tried this? On Mon, Feb 24, 2014 at 8:01 PM, Josh Elser <[EMAIL PROTECTED]> wrote:
Alright guys and gals. Given Keith's last comment, I think we can call this.
The VOTE for rc3 as Apache Accumulo 1.5.1 passes with 4 +1s and nothing else.
I'm out of town this week so this won't be promoted until late next week. Please consider things you would want listed in some release notes and let me know. I'll promote the artifacts when I return and get some release notes written.
Thank you everyone who made the time to be a part of this process. On Feb 28, 2014 10:35 AM, "Eric Newton" <[EMAIL PROTECTED]> wrote:
If you can go ahead and promote the staging repo and create the signed tag in git, somebody else can upload the artifacts to the dist repo, so they can be distributed to the mirrors before we create the release announcement/release notes and update the website.
Sorry to necro this thread, just wanted to throw my 2 cents in.
We had some user code referencing this code directly and our application no longer works in 1.5.1. Just found out today when installing on 1.5.1. In retrospect, we should have been using .listSplits from TableOperatons, but instead we were using the RangeInputSplit method to get the splits for a table.
I guess since we probably shouldn't have been doing that, I don't know if that's a case for this not being deleted without going to deprecated... but we did have a nasty surprise and a deprecation warning would have been nice.
-d On Tue, Feb 25, 2014 at 11:33 PM, Adam Fuchs <[EMAIL PROTECTED]> wrote:
Donald Miner Chief Technology Officer ClearEdge IT Solutions, LLC Cell: 443 799 7807 www.clearedgeit.com
I'm starting to dig around for a workaround and figured someone might be able to help me right away.
In digging deeper, we were using RangeInputSplit because it gave us the splits AND the locations. We use the locations for some data locality placing in our distributed application. listSplits only gives us splits.
Is there an easy way to get both of these pieces of information together? On Thu, Mar 27, 2014 at 3:28 PM, Josh Elser <[EMAIL PROTECTED]> wrote: Donald Miner Chief Technology Officer ClearEdge IT Solutions, LLC Cell: 443 799 7807 www.clearedgeit.com
Changes that were made added more information inside of RangeInputSplit, as well as the class was moved to its own java file instead of being embedded inside of InputFormatBase. The package did not change -- I imagine you may have needed to recompile your application first, but... was there more than that?
That breaks both source and binary compatibility. In this specific case, making things comaptible again isn't hard, but I want to make sure there aren't other changes that he needs. On Fri, Mar 28, 2014 at 11:41 AM, Josh Elser <[EMAIL PROTECTED]> wrote:
But, like I mentioned in the other message, I don't think binary compat was achieved, but the package name, constructors, and methods existing in 1.5.0 were maintained AFAIK. Are we asserting binary compat here as well?
I'm trying to understand if we actually didn't follow our own rules, or if the expectations of the community are exceeding the rules we have for ourselves. I think we're in the latter right now.
Also, reading back through this chain, it was state as unclear as to whether or not an inner class of a class in the public API is also, itself, in the public API.
This should also be clarified in our definition of public API in the README. Obviously, Don and Sean both agree that it should be. The discussion of those on the vote didn't. Doesn't really matter to me either way.
Ah, if all I need to do is change the class name to org.apache.accumulo.core.client.mapreduce.RangeInputSplit .... I feel kind of dumb. I didn't realize it was renamed. I can do that.
On a separate note (maybe more appropriate for the user list) but keeping in here for continuity sake:
We have an application that has daemons running on every Hadoop node. When we kick off a query in our application, this is what happens: - we figure out what splits there are on the table we want and what tablet servers those splits are on - we tell our application's daemons to do a range scan across the tablets that are colocated on the same node (1 range scan per tablet) - daemon processes data - etc.
So, we need both the split information and the locality information to get this to work right.
Is there a better way to do this? It seems like teasing out information from an inputformat class seems like kind of a hack. We could use TabletLocator, but that's not in the public API either? Is there a right way for a client to get locality information?
-d On Fri, Mar 28, 2014 at 1:12 PM, Sean Busbey <busbey+[EMAIL PROTECTED]>wrote:
Donald Miner Chief Technology Officer ClearEdge IT Solutions, LLC Cell: 443 799 7807 www.clearedgeit.com
Can someone make a 1.6 ticket to clarify this confusion in the README?
There is undeniable confusion to date but it doesn't seem like anyone minds including public nested classes either. I'd have to scan over the public members of these classes to make sure we don't inadvertantly advertise something we don't intend to. On Mar 28, 2014 12:17 PM, "Bill Havanki" <[EMAIL PROTECTED]> wrote:
Apache Lucene, Apache Solr and all other Apache Software Foundation projects and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext