-Re: on HBase 1.0
Nicolas Spiegelberg 2011-11-21, 16:31
-User : this is really a core dev technical treatise :)
Some things to think about before we collaborate next Tuesday. I think
some questions need to be really clearly understood before 1.0...
What does 1.0 mean?
- Feature complete for majority of use cases? Ultra stable? Something
that is long lasting? I'm guessing all of the above, emphasis on lasting.
Are we feature complete for the 80% case?
- I know that we need & are still developing hourly snapshots.
- How confident are we about cross-DC replication? We're currently working
on master-master, which is a requirement for many of our future use cases.
- Do we have confidence about a finalized CoProcessor API? Would it be
nice to have one rev of iteration on this once people use it en-masse? I
know we've gone over this on the public JIRA, but it makes a big
difference once everyone feels like it's safe enough to touch. APIs are
always tricky & iterative.
- What about HBCK -Fix? This is a requirement for us. Are there other
scripts that we should write to repair a broken system? What about repair
of various ZK uses?
- When can we deprecate the old 'mapred' user API? That's confusing.
- How do we feel about the Thrift server? It seems like everyone has
their own customizations here. Seems like performant & stable
multi-language support would be critical for a 1.0.
How confident are we in telling people to use HBase
- When users come to us with questions, do we normally point them to the
HBase book or some known material?
- Do we understand how to design an optimal schema?
- Do we understand options for server partitioning and hardware setup?
- What is the optimal way to create a table?
- What are recommended config settings to look at? Why are they
- What configs does a novice user looks at? What configs does a power
user (not developer)?
+ I think Lars' HBase book has been a huge help here. Aligning our 1.0
goals & recommendations with 2nd edition of his book should be critical.
In general, I think that announcing a 1.0 will mean that we will attract
more people, but also more finicky users that will be upset if they have
to look at the debug logs much & won't understand why it doesn't "just
work". I think that's where consulting companies will come in, help, & be
happy. I'm a little worried about the fact that there's region off-lining
issues from time-to-time, but I don't have new master experience.
In general, I wonder if it would be better to wait until 96 (1 more FF)
before announcing 1.0. I also wonder if it would be better to stabilize
on a RC and label it 1.0 post-release when everything is smooth. As an
example, HDFS 0.20.205 is really the best HDFS 1.0. Then again, maybe
acting like HBase 0.94 will inevitably be 1.0 is a way to get various
groups to focus.