Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> HBASE-2182


Copy link to this message
-
Re: HBASE-2182
A few inline notes below:

On Fri, Jun 29, 2012 at 4:42 PM, Elliott Clark <[EMAIL PROTECTED]>wrote:

> I just posted a pretty early skeleton(
> https://issues.apache.org/jira/browse/HBASE-2182) on what I think a netty
> based hbase client/server could look like.
>
> Pros:
>
>   - Faster
>      - Giraph got a 3x perf improvement by droppping hadoop rpc
>

Whats the reference for this? The 3x perf I heard about from Giraph was
from switching to using LMAX's Disruptor instead of queues, internally. We
could do the same, but I'm not certain the model works well for our use
cases where the RPC processing can end up blocked on disk access, etc.
>      - Asynhbase trounces our client when JD benchmarked them
>

I'm still convinced that the majority of this has to do with the way our
batching happens to the server, not async vs sync. (in the current sync
client, once we fill up the buffer, we "flush" from the same thread, and
block the flush until all buffered edits have made it, vs doing it in the
background). We could fix this without going to a fully async model.
>   - Could encourage things to be a little more modular if everything isn't
>   hanging directly off of HRegionServer
>
Sure, but not sure I see why this is Netty vs not-Netty
>   - Netty is better about thread usage than hadoop rpc server.
>
Can you explain further?
>   - Pretty easy to define an rpc protocol after all of the work on
>   protobuf (Thanks everyone)
>   - Decoupling the rpc server library from the hadoop library could allow
>   us to rev the server code easier.
>   - The filter model is very easy to work with.
>      - Security can be just a single filter.
>      - Logging can ba another
>      - Stats can be another.
>
> Cons:
>
>   - Netty and non apache rpc server's don't play well togther.  They might
>   be able to but I haven't gotten there yet.
>
What do you mean "non apache rpc servers"?
>   - Complexity
>      - Two different servers in the src
>      - Confusing users who don't know which to pick
>   - Non-blocking could make the client a harder to write.
>
>
> I'm really just trying to gauge what people think of the direction and if
> it's still something that is wanted.  The code is a loooooong way from even
> being a tech demo, and I'm not a netty expert, so suggestions would be
> welcomed.
>
> Thoughts ? Are people interested in this? Should I push this to my github
> so other can help ?
>

IMO, I'd want to see a noticeable perf difference from the change -
unfortunately it would take a fair amount of work to get to the point where
you could benchmark it. But if you're willing to spend the time to get to
that point, seems worth investigating.

--
Todd Lipcon
Software Engineer, Cloudera
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB