Thanks J-D, that's very helpful.
Your information about Thrift is great. Our primary development language is
C# so using Thrift will allow us to connect our existing code with HBase and
the penalty seems low enough to be worth it.
Unfortunately that is also our achilles' heel, we are far from being Java
experts and it will probably take us a lot of time to become experts so we
can debug and fix problems like you do. My thinking was to build two
independent clusters with cyclic replication so if one crashes we can switch
to the other one while we figure out how to fix the first. However, doing
that requires solid replication capabilities. Can I understand from your
description that you have cyclic, selective replication working in
production already? I see that it's scheduled to be released on 0.21, is it
possible to get it to work on 0.20?
As for the issue with shutting down the master node, what I see is that
running "hbase-daemon.sh stop master" continues printing dots forever.
Looking at the code for that script, it is trying to run "./hbase master
stop". If I run that command manually it seems to ignore the stop parameter
and trying to load another instance of the server which fails in my case
because the server is already running and the JMX port is busy. There is
nothing in the log and the out file only has the exception thrown by the JMX
trying to bind to the busy socket.
Thanks again, I really appreciate the information.
On Thu, Mar 4, 2010 at 20:28, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:
> > 1. I assume you've seen this benchmark by Yahoo (
> > http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf and
> > http://www.brianfrankcooper.net/pubs/ycsb.pdf). They show three main
> > problems: latency goes up quite significantly when doing more
> > operations/sec are capped at about half of the other tested platforms
> > adding new nodes interrupts the normal operation of the cluster for a
> > Do you consider these results a problem and if so are there any plans
> > address them?
> Please see our answer
> http://www.search-hadoop.com/m?[EMAIL PROTECTED]
> > 2. While running our tests (most were done using 0.20.2) we've had a
> > incidents where a table went into "transition" without ever going out
> of it.
> > We had to restart the cluster to release the stuck tables. Is this a
> > issue?
> 0.20.3 has a much better story, 0.20.4 will include even more reliability
> > 3. If I understand correctly then any major upgrade requires completely
> > shutting down the cluster while doing the upgrade as well as deploying
> a new
> > version of the application compiled with the new version client? Did I
> > it correctly? Is there any strategy for upgrading while the cluster is
> > running?
> Lots of different reasons why: Hadoop RPC is versionned, a new Hadoop
> major version requires filesystem upgrades, etc...
> So for HBase, you currently can do rolling restarts between minor
> versions until told otherwise (in the release notes). See
> Also Hadoop RPC will probably be replaced in the future with Avro and
> by then all releases should be backward compatible (we hope).
> > 4. This is more a bug report than a question but it seems that in
> > the master server doesn't stop cleanly and has to be killed manually.
> > someone else seeing it too?
> Can you provide more details? Logs and stack traces appreciated.
> > 5. Are there any performance benchmarks for the Thrift gateway? Do you
> > have an estimate of the performance penalty of using the gateway
> compared to
> > using the native API?
> The good thing with thrift servers is that those they have long lived
> clients so their cache is always full and HotSpot does it's magic. In
> our tests (we use Thrift servers in production here at StumbleUpon),