Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> HBASE-2182


Copy link to this message
-
Re: HBASE-2182
Comments inline your inline.

On Fri, Jun 29, 2012 at 5:04 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:

> A few inline notes below:
>
> On Fri, Jun 29, 2012 at 4:42 PM, Elliott Clark <[EMAIL PROTECTED]
> >wrote:
>
> > I just posted a pretty early skeleton(
> > https://issues.apache.org/jira/browse/HBASE-2182) on what I think a
> netty
> > based hbase client/server could look like.
> >
> > Pros:
> >
> >   - Faster
> >      - Giraph got a 3x perf improvement by droppping hadoop rpc
> >
>
> Whats the reference for this? The 3x perf I heard about from Giraph was
> from switching to using LMAX's Disruptor instead of queues, internally. We
> could do the same, but I'm not certain the model works well for our use
> cases where the RPC processing can end up blocked on disk access, etc.
>
>
https://reviews.apache.org/r/5074/
and in http://www.youtube.com/watch?v=b5Qmz4zPj-M&feature=youtu.be though I
don't think that he says the number 3x in the presentation.

>
> >      - Asynhbase trounces our client when JD benchmarked them
> >
>
> I'm still convinced that the majority of this has to do with the way our
> batching happens to the server, not async vs sync. (in the current sync
> client, once we fill up the buffer, we "flush" from the same thread, and
> block the flush until all buffered edits have made it, vs doing it in the
> background). We could fix this without going to a fully async model.
>
> I am too.  However the single thread on the server that decodes all
requests is also starting to be a concern for me.
>
> >   - Could encourage things to be a little more modular if everything
> isn't
> >   hanging directly off of HRegionServer
> >
> Sure, but not sure I see why this is Netty vs not-Netty
>

If we have two implementations of servers it would hi-light that shared
things should be their own object with a single responsibility.
>
> >   - Netty is better about thread usage than hadoop rpc server.
> >
> Can you explain further?
>
By default Netty's threads are aware of memory pressure and will spin more
up to keep the amount of requests queued under a given threshold.  They
will also stop accepting work if the max number of threads has been reached.
Right now we will keep a queue of 10k(? not sure about the number but it's
just a fixed number) or rpcs requests.  There's no thought to the size of
them, so the thread can OOM if all 10k are puts of large amounts of data.
> >   - Pretty easy to define an rpc protocol after all of the work on
> >   protobuf (Thanks everyone)
> >   - Decoupling the rpc server library from the hadoop library could allow
> >   us to rev the server code easier.
> >   - The filter model is very easy to work with.
> >      - Security can be just a single filter.
> >      - Logging can ba another
> >      - Stats can be another.
> >
> > Cons:
> >
> >   - Netty and non apache rpc server's don't play well togther.  They
> might
> >   be able to but I haven't gotten there yet.
> >
> What do you mean "non apache rpc servers"?

Typo.  I meant non-netty/apache rpc server

>
> >   - Complexity
> >      - Two different servers in the src
> >      - Confusing users who don't know which to pick
> >   - Non-blocking could make the client a harder to write.
> >
> >
> > I'm really just trying to gauge what people think of the direction and if
> > it's still something that is wanted.  The code is a loooooong way from
> even
> > being a tech demo, and I'm not a netty expert, so suggestions would be
> > welcomed.
> >
> > Thoughts ? Are people interested in this? Should I push this to my github
> > so other can help ?
> >
>
> IMO, I'd want to see a noticeable perf difference from the change -
> unfortunately it would take a fair amount of work to get to the point where
> you could benchmark it. But if you're willing to spend the time to get to
> that point, seems worth investigating.
 Netty's use of Buffers that wrap protobuf buffers could save us an array
copy.

However you're right a real benchmark, that makes this more than just guess
work,  is a ways away.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB