|
Elliott Clark
2012-06-29, 23:42
Todd Lipcon
2012-06-30, 00:04
Elliott Clark
2012-06-30, 00:21
Andrew Purtell
2012-06-30, 00:34
Elliott Clark
2012-06-30, 00:59
Andrew Purtell
2012-06-30, 01:10
Ryan Rawson
2012-06-30, 08:27
N Keywal
2012-06-30, 11:50
Elliott Clark
2012-06-30, 21:43
|
-
HBASE-2182Elliott Clark 2012-06-29, 23:42
I just posted a pretty early skeleton(
https://issues.apache.org/jira/browse/HBASE-2182) on what I think a netty based hbase client/server could look like. Pros: - Faster - Giraph got a 3x perf improvement by droppping hadoop rpc - Asynhbase trounces our client when JD benchmarked them - Could encourage things to be a little more modular if everything isn't hanging directly off of HRegionServer - Netty is better about thread usage than hadoop rpc server. - Pretty easy to define an rpc protocol after all of the work on protobuf (Thanks everyone) - Decoupling the rpc server library from the hadoop library could allow us to rev the server code easier. - The filter model is very easy to work with. - Security can be just a single filter. - Logging can ba another - Stats can be another. Cons: - Netty and non apache rpc server's don't play well togther. They might be able to but I haven't gotten there yet. - Complexity - Two different servers in the src - Confusing users who don't know which to pick - Non-blocking could make the client a harder to write. I'm really just trying to gauge what people think of the direction and if it's still something that is wanted. The code is a loooooong way from even being a tech demo, and I'm not a netty expert, so suggestions would be welcomed. Thoughts ? Are people interested in this? Should I push this to my github so other can help ?
-
Re: HBASE-2182Todd Lipcon 2012-06-30, 00:04
A few inline notes below:
On Fri, Jun 29, 2012 at 4:42 PM, Elliott Clark <[EMAIL PROTECTED]>wrote: > I just posted a pretty early skeleton( > https://issues.apache.org/jira/browse/HBASE-2182) on what I think a netty > based hbase client/server could look like. > > Pros: > > - Faster > - Giraph got a 3x perf improvement by droppping hadoop rpc > Whats the reference for this? The 3x perf I heard about from Giraph was from switching to using LMAX's Disruptor instead of queues, internally. We could do the same, but I'm not certain the model works well for our use cases where the RPC processing can end up blocked on disk access, etc. > - Asynhbase trounces our client when JD benchmarked them > I'm still convinced that the majority of this has to do with the way our batching happens to the server, not async vs sync. (in the current sync client, once we fill up the buffer, we "flush" from the same thread, and block the flush until all buffered edits have made it, vs doing it in the background). We could fix this without going to a fully async model. > - Could encourage things to be a little more modular if everything isn't > hanging directly off of HRegionServer > Sure, but not sure I see why this is Netty vs not-Netty > - Netty is better about thread usage than hadoop rpc server. > Can you explain further? > - Pretty easy to define an rpc protocol after all of the work on > protobuf (Thanks everyone) > - Decoupling the rpc server library from the hadoop library could allow > us to rev the server code easier. > - The filter model is very easy to work with. > - Security can be just a single filter. > - Logging can ba another > - Stats can be another. > > Cons: > > - Netty and non apache rpc server's don't play well togther. They might > be able to but I haven't gotten there yet. > What do you mean "non apache rpc servers"? > - Complexity > - Two different servers in the src > - Confusing users who don't know which to pick > - Non-blocking could make the client a harder to write. > > > I'm really just trying to gauge what people think of the direction and if > it's still something that is wanted. The code is a loooooong way from even > being a tech demo, and I'm not a netty expert, so suggestions would be > welcomed. > > Thoughts ? Are people interested in this? Should I push this to my github > so other can help ? > IMO, I'd want to see a noticeable perf difference from the change - unfortunately it would take a fair amount of work to get to the point where you could benchmark it. But if you're willing to spend the time to get to that point, seems worth investigating. -- Todd Lipcon Software Engineer, Cloudera
-
Re: HBASE-2182Elliott Clark 2012-06-30, 00:21
Comments inline your inline.
On Fri, Jun 29, 2012 at 5:04 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > A few inline notes below: > > On Fri, Jun 29, 2012 at 4:42 PM, Elliott Clark <[EMAIL PROTECTED] > >wrote: > > > I just posted a pretty early skeleton( > > https://issues.apache.org/jira/browse/HBASE-2182) on what I think a > netty > > based hbase client/server could look like. > > > > Pros: > > > > - Faster > > - Giraph got a 3x perf improvement by droppping hadoop rpc > > > > Whats the reference for this? The 3x perf I heard about from Giraph was > from switching to using LMAX's Disruptor instead of queues, internally. We > could do the same, but I'm not certain the model works well for our use > cases where the RPC processing can end up blocked on disk access, etc. > > https://reviews.apache.org/r/5074/ and in http://www.youtube.com/watch?v=b5Qmz4zPj-M&feature=youtu.be though I don't think that he says the number 3x in the presentation. > > > - Asynhbase trounces our client when JD benchmarked them > > > > I'm still convinced that the majority of this has to do with the way our > batching happens to the server, not async vs sync. (in the current sync > client, once we fill up the buffer, we "flush" from the same thread, and > block the flush until all buffered edits have made it, vs doing it in the > background). We could fix this without going to a fully async model. > > I am too. However the single thread on the server that decodes all requests is also starting to be a concern for me. > > > - Could encourage things to be a little more modular if everything > isn't > > hanging directly off of HRegionServer > > > Sure, but not sure I see why this is Netty vs not-Netty > If we have two implementations of servers it would hi-light that shared things should be their own object with a single responsibility. > > > - Netty is better about thread usage than hadoop rpc server. > > > Can you explain further? > By default Netty's threads are aware of memory pressure and will spin more up to keep the amount of requests queued under a given threshold. They will also stop accepting work if the max number of threads has been reached. Right now we will keep a queue of 10k(? not sure about the number but it's just a fixed number) or rpcs requests. There's no thought to the size of them, so the thread can OOM if all 10k are puts of large amounts of data. > > - Pretty easy to define an rpc protocol after all of the work on > > protobuf (Thanks everyone) > > - Decoupling the rpc server library from the hadoop library could allow > > us to rev the server code easier. > > - The filter model is very easy to work with. > > - Security can be just a single filter. > > - Logging can ba another > > - Stats can be another. > > > > Cons: > > > > - Netty and non apache rpc server's don't play well togther. They > might > > be able to but I haven't gotten there yet. > > > What do you mean "non apache rpc servers"? Typo. I meant non-netty/apache rpc server > > > - Complexity > > - Two different servers in the src > > - Confusing users who don't know which to pick > > - Non-blocking could make the client a harder to write. > > > > > > I'm really just trying to gauge what people think of the direction and if > > it's still something that is wanted. The code is a loooooong way from > even > > being a tech demo, and I'm not a netty expert, so suggestions would be > > welcomed. > > > > Thoughts ? Are people interested in this? Should I push this to my github > > so other can help ? > > > > IMO, I'd want to see a noticeable perf difference from the change - > unfortunately it would take a fair amount of work to get to the point where > you could benchmark it. But if you're willing to spend the time to get to > that point, seems worth investigating. Netty's use of Buffers that wrap protobuf buffers could save us an array copy. However you're right a real benchmark, that makes this more than just guess work, is a ways away.
-
Re: HBASE-2182Andrew Purtell 2012-06-30, 00:34
Without SASL/krb/security integration with the rest of Hadoop this would be a nonstarter for us. I didn't see that mentioned?
On Jun 29, 2012, at 5:04 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > A few inline notes below: > > On Fri, Jun 29, 2012 at 4:42 PM, Elliott Clark <[EMAIL PROTECTED]>wrote: > >> I just posted a pretty early skeleton( >> https://issues.apache.org/jira/browse/HBASE-2182) on what I think a netty >> based hbase client/server could look like. >> >> Pros: >> >> - Faster >> - Giraph got a 3x perf improvement by droppping hadoop rpc >> > > Whats the reference for this? The 3x perf I heard about from Giraph was > from switching to using LMAX's Disruptor instead of queues, internally. We > could do the same, but I'm not certain the model works well for our use > cases where the RPC processing can end up blocked on disk access, etc. > > >> - Asynhbase trounces our client when JD benchmarked them >> > > I'm still convinced that the majority of this has to do with the way our > batching happens to the server, not async vs sync. (in the current sync > client, once we fill up the buffer, we "flush" from the same thread, and > block the flush until all buffered edits have made it, vs doing it in the > background). We could fix this without going to a fully async model. > > >> - Could encourage things to be a little more modular if everything isn't >> hanging directly off of HRegionServer >> > Sure, but not sure I see why this is Netty vs not-Netty > > >> - Netty is better about thread usage than hadoop rpc server. >> > Can you explain further? > > >> - Pretty easy to define an rpc protocol after all of the work on >> protobuf (Thanks everyone) >> - Decoupling the rpc server library from the hadoop library could allow >> us to rev the server code easier. >> - The filter model is very easy to work with. >> - Security can be just a single filter. >> - Logging can ba another >> - Stats can be another. >> >> Cons: >> >> - Netty and non apache rpc server's don't play well togther. They might >> be able to but I haven't gotten there yet. >> > What do you mean "non apache rpc servers"? > > >> - Complexity >> - Two different servers in the src >> - Confusing users who don't know which to pick >> - Non-blocking could make the client a harder to write. >> >> >> I'm really just trying to gauge what people think of the direction and if >> it's still something that is wanted. The code is a loooooong way from even >> being a tech demo, and I'm not a netty expert, so suggestions would be >> welcomed. >> >> Thoughts ? Are people interested in this? Should I push this to my github >> so other can help ? >> > > IMO, I'd want to see a noticeable perf difference from the change - > unfortunately it would take a fair amount of work to get to the point where > you could benchmark it. But if you're willing to spend the time to get to > that point, seems worth investigating. > > -- > Todd Lipcon > Software Engineer, Cloudera
-
Re: HBASE-2182Elliott Clark 2012-06-30, 00:59
Sorry I only alluded to it in the bullet point about the filter model. I
would imagine that as a (or two) filter in the channel stack. It's honestly something that I haven't gotten to looking at in-depth yet. On Fri, Jun 29, 2012 at 5:34 PM, Andrew Purtell <[EMAIL PROTECTED]>wrote: > Without SASL/krb/security integration with the rest of Hadoop this would > be a nonstarter for us. I didn't see that mentioned? > > On Jun 29, 2012, at 5:04 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > > > A few inline notes below: > > > > On Fri, Jun 29, 2012 at 4:42 PM, Elliott Clark <[EMAIL PROTECTED] > >wrote: > > > >> I just posted a pretty early skeleton( > >> https://issues.apache.org/jira/browse/HBASE-2182) on what I think a > netty > >> based hbase client/server could look like. > >> > >> Pros: > >> > >> - Faster > >> - Giraph got a 3x perf improvement by droppping hadoop rpc > >> > > > > Whats the reference for this? The 3x perf I heard about from Giraph was > > from switching to using LMAX's Disruptor instead of queues, internally. > We > > could do the same, but I'm not certain the model works well for our use > > cases where the RPC processing can end up blocked on disk access, etc. > > > > > >> - Asynhbase trounces our client when JD benchmarked them > >> > > > > I'm still convinced that the majority of this has to do with the way our > > batching happens to the server, not async vs sync. (in the current sync > > client, once we fill up the buffer, we "flush" from the same thread, and > > block the flush until all buffered edits have made it, vs doing it in the > > background). We could fix this without going to a fully async model. > > > > > >> - Could encourage things to be a little more modular if everything > isn't > >> hanging directly off of HRegionServer > >> > > Sure, but not sure I see why this is Netty vs not-Netty > > > > > >> - Netty is better about thread usage than hadoop rpc server. > >> > > Can you explain further? > > > > > >> - Pretty easy to define an rpc protocol after all of the work on > >> protobuf (Thanks everyone) > >> - Decoupling the rpc server library from the hadoop library could allow > >> us to rev the server code easier. > >> - The filter model is very easy to work with. > >> - Security can be just a single filter. > >> - Logging can ba another > >> - Stats can be another. > >> > >> Cons: > >> > >> - Netty and non apache rpc server's don't play well togther. They > might > >> be able to but I haven't gotten there yet. > >> > > What do you mean "non apache rpc servers"? > > > > > >> - Complexity > >> - Two different servers in the src > >> - Confusing users who don't know which to pick > >> - Non-blocking could make the client a harder to write. > >> > >> > >> I'm really just trying to gauge what people think of the direction and > if > >> it's still something that is wanted. The code is a loooooong way from > even > >> being a tech demo, and I'm not a netty expert, so suggestions would be > >> welcomed. > >> > >> Thoughts ? Are people interested in this? Should I push this to my > github > >> so other can help ? > >> > > > > IMO, I'd want to see a noticeable perf difference from the change - > > unfortunately it would take a fair amount of work to get to the point > where > > you could benchmark it. But if you're willing to spend the time to get to > > that point, seems worth investigating. > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera >
-
Re: HBASE-2182Andrew Purtell 2012-06-30, 01:10
I worry it's more complicated than that given nobody seems to have done it, at least... "netty SASL" or "netty wrap SASL" or "netty SASL socket" turns up paltry results in a Google search. Avro considered it but didn't. We considered it for Zookeeper but didn't. (Excluded very early due to ZK authentication design particulars though.)
- Andy On Jun 29, 2012, at 5:59 PM, Elliott Clark <[EMAIL PROTECTED]> wrote: > Sorry I only alluded to it in the bullet point about the filter model. I > would imagine that as a (or two) filter in the channel stack. It's > honestly something that I haven't gotten to looking at in-depth yet. > > On Fri, Jun 29, 2012 at 5:34 PM, Andrew Purtell <[EMAIL PROTECTED]>wrote: > >> Without SASL/krb/security integration with the rest of Hadoop this would >> be a nonstarter for us. I didn't see that mentioned? >> >> On Jun 29, 2012, at 5:04 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: >> >>> A few inline notes below: >>> >>> On Fri, Jun 29, 2012 at 4:42 PM, Elliott Clark <[EMAIL PROTECTED] >>> wrote: >>> >>>> I just posted a pretty early skeleton( >>>> https://issues.apache.org/jira/browse/HBASE-2182) on what I think a >> netty >>>> based hbase client/server could look like. >>>> >>>> Pros: >>>> >>>> - Faster >>>> - Giraph got a 3x perf improvement by droppping hadoop rpc >>>> >>> >>> Whats the reference for this? The 3x perf I heard about from Giraph was >>> from switching to using LMAX's Disruptor instead of queues, internally. >> We >>> could do the same, but I'm not certain the model works well for our use >>> cases where the RPC processing can end up blocked on disk access, etc. >>> >>> >>>> - Asynhbase trounces our client when JD benchmarked them >>>> >>> >>> I'm still convinced that the majority of this has to do with the way our >>> batching happens to the server, not async vs sync. (in the current sync >>> client, once we fill up the buffer, we "flush" from the same thread, and >>> block the flush until all buffered edits have made it, vs doing it in the >>> background). We could fix this without going to a fully async model. >>> >>> >>>> - Could encourage things to be a little more modular if everything >> isn't >>>> hanging directly off of HRegionServer >>>> >>> Sure, but not sure I see why this is Netty vs not-Netty >>> >>> >>>> - Netty is better about thread usage than hadoop rpc server. >>>> >>> Can you explain further? >>> >>> >>>> - Pretty easy to define an rpc protocol after all of the work on >>>> protobuf (Thanks everyone) >>>> - Decoupling the rpc server library from the hadoop library could allow >>>> us to rev the server code easier. >>>> - The filter model is very easy to work with. >>>> - Security can be just a single filter. >>>> - Logging can ba another >>>> - Stats can be another. >>>> >>>> Cons: >>>> >>>> - Netty and non apache rpc server's don't play well togther. They >> might >>>> be able to but I haven't gotten there yet. >>>> >>> What do you mean "non apache rpc servers"? >>> >>> >>>> - Complexity >>>> - Two different servers in the src >>>> - Confusing users who don't know which to pick >>>> - Non-blocking could make the client a harder to write. >>>> >>>> >>>> I'm really just trying to gauge what people think of the direction and >> if >>>> it's still something that is wanted. The code is a loooooong way from >> even >>>> being a tech demo, and I'm not a netty expert, so suggestions would be >>>> welcomed. >>>> >>>> Thoughts ? Are people interested in this? Should I push this to my >> github >>>> so other can help ? >>>> >>> >>> IMO, I'd want to see a noticeable perf difference from the change - >>> unfortunately it would take a fair amount of work to get to the point >> where >>> you could benchmark it. But if you're willing to spend the time to get to >>> that point, seems worth investigating. >>> >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >>
-
Re: HBASE-2182Ryan Rawson 2012-06-30, 08:27
On Fri, Jun 29, 2012 at 5:04 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
> A few inline notes below: > > On Fri, Jun 29, 2012 at 4:42 PM, Elliott Clark <[EMAIL PROTECTED]>wrote: > >> I just posted a pretty early skeleton( >> https://issues.apache.org/jira/browse/HBASE-2182) on what I think a netty >> based hbase client/server could look like. >> >> Pros: >> >> - Faster >> - Giraph got a 3x perf improvement by droppping hadoop rpc >> > > Whats the reference for this? The 3x perf I heard about from Giraph was > from switching to using LMAX's Disruptor instead of queues, internally. We > could do the same, but I'm not certain the model works well for our use > cases where the RPC processing can end up blocked on disk access, etc. > > >> - Asynhbase trounces our client when JD benchmarked them >> > > I'm still convinced that the majority of this has to do with the way our > batching happens to the server, not async vs sync. (in the current sync > client, once we fill up the buffer, we "flush" from the same thread, and > block the flush until all buffered edits have made it, vs doing it in the > background). We could fix this without going to a fully async model. I also agree here, if you do the apriori code analysis, it becomes obvious that the issue is that slower regionservers can hold up entire batches even if 90%+ of the Puts were already acked... And don't forget that we used to issue Puts to regionservers SERIALLY until we do the current parallelism code... (not that the code is great, but it was relatively easy to fix at the time). > > >> - Could encourage things to be a little more modular if everything isn't >> hanging directly off of HRegionServer >> > Sure, but not sure I see why this is Netty vs not-Netty > > >> - Netty is better about thread usage than hadoop rpc server. >> > Can you explain further? > > >> - Pretty easy to define an rpc protocol after all of the work on >> protobuf (Thanks everyone) >> - Decoupling the rpc server library from the hadoop library could allow >> us to rev the server code easier. >> - The filter model is very easy to work with. >> - Security can be just a single filter. >> - Logging can ba another >> - Stats can be another. >> >> Cons: >> >> - Netty and non apache rpc server's don't play well togther. They might >> be able to but I haven't gotten there yet. >> > What do you mean "non apache rpc servers"? > > >> - Complexity >> - Two different servers in the src >> - Confusing users who don't know which to pick >> - Non-blocking could make the client a harder to write. >> >> >> I'm really just trying to gauge what people think of the direction and if >> it's still something that is wanted. The code is a loooooong way from even >> being a tech demo, and I'm not a netty expert, so suggestions would be >> welcomed. >> >> Thoughts ? Are people interested in this? Should I push this to my github >> so other can help ? >> > > IMO, I'd want to see a noticeable perf difference from the change - > unfortunately it would take a fair amount of work to get to the point where > you could benchmark it. But if you're willing to spend the time to get to > that point, seems worth investigating. > > -- > Todd Lipcon > Software Engineer, Cloudera
-
Re: HBASE-2182N Keywal 2012-06-30, 11:50
On Sat, Jun 30, 2012 at 10:27 AM, Ryan Rawson <[EMAIL PROTECTED]> wrote:
> On Fri, Jun 29, 2012 at 5:04 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: >>> I just posted a pretty early skeleton( >>> https://issues.apache.org/jira/browse/HBASE-2182) on what I think a netty >>> based hbase client/server could look like. >>> Pros: >>> - Faster >>> - Giraph got a 3x perf improvement by droppping hadoop rpc >> Whats the reference for this? The 3x perf I heard about from Giraph was >> from switching to using LMAX's Disruptor instead of queues, internally. We >> could do the same, but I'm not certain the model works well for our use >> cases where the RPC processing can end up blocked on disk access, etc. >>> - Asynhbase trounces our client when JD benchmarked them >> >> I'm still convinced that the majority of this has to do with the way our >> batching happens to the server, not async vs sync. (in the current sync >> client, once we fill up the buffer, we "flush" from the same thread, and >> block the flush until all buffered edits have made it, vs doing it in the >> background). We could fix this without going to a fully async model. > > I also agree here, if you do the apriori code analysis, it becomes > obvious that the issue is that slower regionservers can hold up entire > batches even if 90%+ of the Puts were already acked... fwiw, I had something roughly similar in mind (work in background instead of waiting for the result of the first part). I created HBASE-6295 to detail what I was thinking about.
-
Re: HBASE-2182Elliott Clark 2012-06-30, 21:43
Whoops dev got dropped; adding it back on.
Since everything is already a protobuf, to me it doesn't really make sense to keep the hadoop serialization overhead too. In addition the netty protocol allows for zero copy, which would be pretty tough to implement with the older rpc format. On Sat, Jun 30, 2012 at 2:31 PM, ryan rawson <[EMAIL PROTECTED]> wrote: > But isn't worth losing compatibility? If you have compatibility you can do > each side separately > > Sent from your iPhone > > On Jun 30, 2012, at 1:57 PM, Elliott Clark <[EMAIL PROTECTED]> wrote: > > That protocol was chosen because netty provides a ready built > implementation of the encoder and the decoder. Looking at the code for the > two, they are pretty easy to work on if needed. > > On Sat, Jun 30, 2012 at 1:24 AM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > >> Hey Elliott, >> >> I saw this comment in the bug: >> >> "All communication takes place through a wrapped protocol buffer protocol: >> [32 bit length field, protobuf data]" >> >> HBase RPC already includes a framing protocol, that is every message >> is prefixed with a 4-byte size of the following data. Eg: >> HBaseServer.java:1157 is where the request buffer is allocated server >> side, and HBaseServer.java:353 for the response to the client. >> >> I'm interested in better HBase network clients, and I have a few ideas >> of how to approach the problem. Netty is obviously the ideal way to >> approach it I think. >> >> What is your intent and ability to work on this line of code? >> >> -ryan >> >> >> >> >> >> On Fri, Jun 29, 2012 at 4:42 PM, Elliott Clark <[EMAIL PROTECTED]> >> wrote: >> > I just posted a pretty early skeleton( >> > https://issues.apache.org/jira/browse/HBASE-2182) on what I think a >> netty >> > based hbase client/server could look like. >> > >> > Pros: >> > >> > - Faster >> > - Giraph got a 3x perf improvement by droppping hadoop rpc >> > - Asynhbase trounces our client when JD benchmarked them >> > - Could encourage things to be a little more modular if everything >> isn't >> > hanging directly off of HRegionServer >> > - Netty is better about thread usage than hadoop rpc server. >> > - Pretty easy to define an rpc protocol after all of the work on >> > protobuf (Thanks everyone) >> > - Decoupling the rpc server library from the hadoop library could >> allow >> > us to rev the server code easier. >> > - The filter model is very easy to work with. >> > - Security can be just a single filter. >> > - Logging can ba another >> > - Stats can be another. >> > >> > Cons: >> > >> > - Netty and non apache rpc server's don't play well togther. They >> might >> > be able to but I haven't gotten there yet. >> > - Complexity >> > - Two different servers in the src >> > - Confusing users who don't know which to pick >> > - Non-blocking could make the client a harder to write. >> > >> > >> > I'm really just trying to gauge what people think of the direction and >> if >> > it's still something that is wanted. The code is a loooooong way from >> even >> > being a tech demo, and I'm not a netty expert, so suggestions would be >> > welcomed. >> > >> > Thoughts ? Are people interested in this? Should I push this to my >> github >> > so other can help ? >> > > |