Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> on HBase 1.0


-User : this is really a core dev technical treatise :)

Some things to think about before we collaborate next Tuesday.  I think
some questions need to be really clearly understood before 1.0...

What does 1.0 mean?
- Feature complete for majority of use cases?  Ultra stable?  Something
that is long lasting?  I'm guessing all of the above, emphasis on lasting.

Are we feature complete for the 80% case?
- I know that we need & are still developing hourly snapshots.
- How confident are we about cross-DC replication? We're currently working
on master-master, which is a requirement for many of our future use cases.
- Do we have confidence about a finalized CoProcessor API?  Would it be
nice to have one rev of iteration on this once people use it en-masse?  I
know we've gone over this on the public JIRA, but it makes a big
difference once everyone feels like it's safe enough to touch.  APIs are
always tricky & iterative.
- What about HBCK -Fix?  This is a requirement for us.  Are there other
scripts that we should write to repair a broken system?  What about repair
of various ZK uses?
- When can we deprecate the old 'mapred' user API?  That's confusing.
- How do we feel about the Thrift server?  It seems like everyone has
their own customizations here.  Seems like performant & stable
multi-language support would be critical for a 1.0.

How confident are we in telling people to use HBase
- When users come to us with questions, do we normally point them to the
HBase book or some known material?
- Do we understand how to design an optimal schema?
- Do we understand options for server partitioning and hardware setup?
- What is the optimal way to create a table?
- What are recommended config settings to look at?  Why are they
recommended?
- What configs does a novice user looks at?  What configs does a power
user (not developer)?
+ I think Lars' HBase book has been a huge help here.  Aligning our 1.0
goals & recommendations with 2nd edition of his book should be critical.

In general, I think that announcing a 1.0 will mean that we will attract
more people, but also more finicky users that will be upset if they have
to look at the debug logs much & won't understand why it doesn't "just
work".  I think that's where consulting companies will come in, help, & be
happy.  I'm a little worried about the fact that there's region off-lining
issues from time-to-time, but I don't have new master experience.

In general, I wonder if it would be better to wait until 96 (1 more FF)
before announcing 1.0.  I also wonder if it would be better to stabilize
on a RC and label it 1.0 post-release when everything is smooth.  As an
example, HDFS 0.20.205 is really the best HDFS 1.0.  Then again, maybe
acting like HBase 0.94 will inevitably be 1.0 is a way to get various
groups to focus.