Joe Pallas 2012-08-21, 01:18
Stack 2012-08-21, 16:29
Joe Pallas 2012-08-22, 23:06
We have tried a few different things wrt the C++ clients and thrift. Just
putting out some of out thoughts here.
First, we used the existing Thrift proxy as a separate tier (Thrift proxy
tier). The issue there was that we just didn't get enough throughput (for
various reasons). Indepedently, adoption of HBase from C++ was increasing
- so we thought it made sense to write a native client.
So we wrote the native C++ client and embedded the thrift proxy into the
region server (embedded thrift proxy). Cutting the redirect from the
client was one gain (as the native client is a smart client), but the real
advantage came from short-circuiting the flow. In the thrift proxy tier
case, the Thrift client would talk to the proxy using Thrift
serialization, proxy would deserialize the Thrift call and re-serialize it
into the Java client format, then send it to the region server which would
deserialize the java formatted buffers again. But in the embedded proxy +
native client, we can short-circuit on the embedded proxy and make a
function call to the region server which is running in the same JVM (which
helps cut one round of serialization and deserialization).
The issues, however, with the thrift based approach are that the Java
objects (Htable, scan, get, put, etc) are not thrift definitions, so they
need to be updated as a separate (and often very different) set of api's
every time there is an enhancement to the Java side of things. The proxy
tier has to be separately configured/tuned/bug fixed from the region
server to make sure it is as performant as the region server - as the
overall system will perform like the slowest component in the stack.
The ideal solution (IMHO) is to have a C++ client which has a compatible
protocol with the Java client, so that there are no significant perf
differences between the two approaches, and there is no separate proxy to
tune. Just a though of course, might be hard to achieve. Of course we have
just talked about this :) but with the move to protocol buffers in trunk,
this should be easier.
Out of curiosity, why thrift2 - do you specifically need thrift api's to
region servers? Why not " efficient C/C++ client for HBase"?
On 8/22/12 4:06 PM, "Joe Pallas" <firstname.lastname@example.org> wrote:
>On Aug 21, 2012, at 9:29 AM, Stack wrote:
>> On Mon, Aug 20, 2012 at 6:18 PM, Joe Pallas <email@example.com>
>>> Anyone out there actively using the thrift2 interface in 0.94? Thrift
>>>bindings for C++ don¹t seem to handle optional arguments too well (that
>>>is to say, it seems that optional arguments are not optional).
>>>Unfortunately, checkAndPut uses an optional argument for value to
>>>distinguish between the two cases (value must match vs no cell with
>>>that column qualifier). Any clues on how to work around that
>>>difficulty would be welcome.
>> If you make a patch, we'll commit it Joe.
>Well, I think the patch really needs to be in Thrift; the only workaround
>I can see is to restructure the hbase.thrift interface file to avoid
>having routines with optional arguments. It seems a shame to break
>compatibility with existing clients for that, and I am not sure if there
>is a way to do it without breaking compatibility. (On the other hand,
>we¹re talking about thrift2, so it isn¹t like there are many existing
>The state of Thrift documentation is lamentable. The original white
>paper is the most detailed information I can find about compatibility
>rules. It has enough information to tell me that Thrift doesn¹t support
>overloading of routine names within a service, because the names are the
>identifiers used to identify the routines. I think that means it isn¹t
>possible to make a compatible change that would only affect the client
>> Have you seen this?
>> https://github.com/facebook/native-cpp-hbase-client Would it help?
>The native client stuff is certainly interesting, but, as near as I can
Joe Pallas 2012-08-28, 19:07