Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> on HBase 1.0


-User : this is really a core dev technical treatise :)

Some things to think about before we collaborate next Tuesday.  I think
some questions need to be really clearly understood before 1.0...

What does 1.0 mean?
- Feature complete for majority of use cases?  Ultra stable?  Something
that is long lasting?  I'm guessing all of the above, emphasis on lasting.

Are we feature complete for the 80% case?
- I know that we need & are still developing hourly snapshots.
- How confident are we about cross-DC replication? We're currently working
on master-master, which is a requirement for many of our future use cases.
- Do we have confidence about a finalized CoProcessor API?  Would it be
nice to have one rev of iteration on this once people use it en-masse?  I
know we've gone over this on the public JIRA, but it makes a big
difference once everyone feels like it's safe enough to touch.  APIs are
always tricky & iterative.
- What about HBCK -Fix?  This is a requirement for us.  Are there other
scripts that we should write to repair a broken system?  What about repair
of various ZK uses?
- When can we deprecate the old 'mapred' user API?  That's confusing.
- How do we feel about the Thrift server?  It seems like everyone has
their own customizations here.  Seems like performant & stable
multi-language support would be critical for a 1.0.

How confident are we in telling people to use HBase
- When users come to us with questions, do we normally point them to the
HBase book or some known material?
- Do we understand how to design an optimal schema?
- Do we understand options for server partitioning and hardware setup?
- What is the optimal way to create a table?
- What are recommended config settings to look at?  Why are they
recommended?
- What configs does a novice user looks at?  What configs does a power
user (not developer)?
+ I think Lars' HBase book has been a huge help here.  Aligning our 1.0
goals & recommendations with 2nd edition of his book should be critical.

In general, I think that announcing a 1.0 will mean that we will attract
more people, but also more finicky users that will be upset if they have
to look at the debug logs much & won't understand why it doesn't "just
work".  I think that's where consulting companies will come in, help, & be
happy.  I'm a little worried about the fact that there's region off-lining
issues from time-to-time, but I don't have new master experience.

In general, I wonder if it would be better to wait until 96 (1 more FF)
before announcing 1.0.  I also wonder if it would be better to stabilize
on a RC and label it 1.0 post-release when everything is smooth.  As an
example, HDFS 0.20.205 is really the best HDFS 1.0.  Then again, maybe
acting like HBase 0.94 will inevitably be 1.0 is a way to get various
groups to focus.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB