Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # dev - Re: HBASE-2182


+
Elliott Clark 2012-06-30, 21:43
+
Elliott Clark 2012-06-29, 23:42
+
Todd Lipcon 2012-06-30, 00:04
+
Ryan Rawson 2012-06-30, 08:27
+
N Keywal 2012-06-30, 11:50
+
Andrew Purtell 2012-06-30, 00:34
Copy link to this message
-
Re: HBASE-2182
Elliott Clark 2012-06-30, 00:59
Sorry I only alluded to it in the bullet point about the filter model.  I
would imagine that as a (or two) filter in the channel stack.  It's
honestly something that I haven't gotten to looking at in-depth yet.

On Fri, Jun 29, 2012 at 5:34 PM, Andrew Purtell <[EMAIL PROTECTED]>wrote:

> Without SASL/krb/security integration with the rest of Hadoop this would
> be a nonstarter for us. I didn't see that mentioned?
>
> On Jun 29, 2012, at 5:04 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
>
> > A few inline notes below:
> >
> > On Fri, Jun 29, 2012 at 4:42 PM, Elliott Clark <[EMAIL PROTECTED]
> >wrote:
> >
> >> I just posted a pretty early skeleton(
> >> https://issues.apache.org/jira/browse/HBASE-2182) on what I think a
> netty
> >> based hbase client/server could look like.
> >>
> >> Pros:
> >>
> >>  - Faster
> >>     - Giraph got a 3x perf improvement by droppping hadoop rpc
> >>
> >
> > Whats the reference for this? The 3x perf I heard about from Giraph was
> > from switching to using LMAX's Disruptor instead of queues, internally.
> We
> > could do the same, but I'm not certain the model works well for our use
> > cases where the RPC processing can end up blocked on disk access, etc.
> >
> >
> >>     - Asynhbase trounces our client when JD benchmarked them
> >>
> >
> > I'm still convinced that the majority of this has to do with the way our
> > batching happens to the server, not async vs sync. (in the current sync
> > client, once we fill up the buffer, we "flush" from the same thread, and
> > block the flush until all buffered edits have made it, vs doing it in the
> > background). We could fix this without going to a fully async model.
> >
> >
> >>  - Could encourage things to be a little more modular if everything
> isn't
> >>  hanging directly off of HRegionServer
> >>
> > Sure, but not sure I see why this is Netty vs not-Netty
> >
> >
> >>  - Netty is better about thread usage than hadoop rpc server.
> >>
> > Can you explain further?
> >
> >
> >>  - Pretty easy to define an rpc protocol after all of the work on
> >>  protobuf (Thanks everyone)
> >>  - Decoupling the rpc server library from the hadoop library could allow
> >>  us to rev the server code easier.
> >>  - The filter model is very easy to work with.
> >>     - Security can be just a single filter.
> >>     - Logging can ba another
> >>     - Stats can be another.
> >>
> >> Cons:
> >>
> >>  - Netty and non apache rpc server's don't play well togther.  They
> might
> >>  be able to but I haven't gotten there yet.
> >>
> > What do you mean "non apache rpc servers"?
> >
> >
> >>  - Complexity
> >>     - Two different servers in the src
> >>     - Confusing users who don't know which to pick
> >>  - Non-blocking could make the client a harder to write.
> >>
> >>
> >> I'm really just trying to gauge what people think of the direction and
> if
> >> it's still something that is wanted.  The code is a loooooong way from
> even
> >> being a tech demo, and I'm not a netty expert, so suggestions would be
> >> welcomed.
> >>
> >> Thoughts ? Are people interested in this? Should I push this to my
> github
> >> so other can help ?
> >>
> >
> > IMO, I'd want to see a noticeable perf difference from the change -
> > unfortunately it would take a fair amount of work to get to the point
> where
> > you could benchmark it. But if you're willing to spend the time to get to
> > that point, seems worth investigating.
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
>
+
Andrew Purtell 2012-06-30, 01:10
+
Elliott Clark 2012-06-30, 00:21