|
|
-
Re: Netty/Avro IPC problem: channel closedJames Baldassari 2011-09-14, 19:49
Glad it's working so far. If you do see any issues, please let us know
and/or file a JIRA. By the way, since you're using Avro 1.5.2 with Netty you can now take advantage of asynchronous RPCs if this suits your use case. I have some sample code out here: https://github.com/jbaldassari/Avro-RPC -James On Wed, Sep 14, 2011 at 3:06 PM, Yang <[EMAIL PROTECTED]> wrote: > yeah, I found i was actually using 1.5.1... > updated to 1.5.2 , now it works fine so far after 1 hour > > Thanks a lot! > Yang > > On Wed, Sep 14, 2011 at 10:56 AM, James Baldassari > <[EMAIL PROTECTED]> wrote: > > It appears to be pre-1.5.2 from this part of the stack trace: > > > > at java.util.concurrent.Semaphore.acquire(Semaphore.java:313) > > at > > > org.apache.avro.ipc.NettyTransceiver$CallFuture.get(NettyTransceiver.java:203) > > > > CallFuture was moved out of NettyTransceiver as part of AVRO-539 and is > now > > a stand-alone class. Also the Semaphore inside CallFuture was replaced > with > > a CountDownLatch, so in 1.5.2 and later we should never see CallFuture > > waiting on a Semaphore. > > > > From your initial description it appears that some temporary network > > disruption might have caused the connection between the client and server > to > > close, and then the client never recovered from this situation. This > > doesn't surprise me because I don't think the pre-1.5.2 NettyTransceiver > had > > any way to recover from a connection failure. While working on AVRO-539 > I > > modified the transceiver code such that it would attempt to re-establish > the > > connection if the connection was lost, so that's why I think this may > help > > you. Just a guess though. But like I said, since the code has changed > so > > much in 1.5.2 and later, it will be much easier to figure out what's > wrong > > (and fix it if necessary) if you can reproduce it using 1.5.2 or later. > > > > -James > > > > > > On Wed, Sep 14, 2011 at 1:39 PM, Yang <[EMAIL PROTECTED]> wrote: > >> > >> thanks James: > >> > >> I *think* I'm using 1.5.2, but could check to be sure. > >> how do you determine that it is a pre-1.5.2 version? > >> > >> Yang > >> > >> On Wed, Sep 14, 2011 at 10:25 AM, James Baldassari > >> <[EMAIL PROTECTED]> wrote: > >> > Hi Yang, > >> > > >> > From the stack trace you posted it appears that you are using a > version > >> > of > >> > Avro prior to 1.5.2. Which version are you using? There have been a > >> > number > >> > of significant changes recently to the RPC framework and the Netty > >> > implementation in particular. Could you please try to reproduce the > >> > problem > >> > using Avro 1.5.2 or newer? The problem may be resolved with an > >> > upgrade. If > >> > the problem still exists in the newer versions, it will be a lot > easier > >> > to > >> > diagnose/fix it if we can see stack traces from a post-1.5.2 version. > >> > > >> > Thanks, > >> > James > >> > > >> > > >> > On Wed, Sep 14, 2011 at 1:08 PM, Yang <[EMAIL PROTECTED]> wrote: > >> >> > >> >> I'm always seeing these "channel closed " exceptions , with low > >> >> probability, i.e. about every 10 hours under heavy load. > >> >> > >> >> I'm not sure if it's the server that got the channel closed or the > >> >> client, so I included the exception stack from both sides. > >> >> anybody has an idea how to debug this? > >> >> > >> >> also, let's say it does have a valid reason for closing this, what is > >> >> my strategy of coping with this? I originally have many > >> >> senders, due to the channel close exception, many of them died, after > >> >> this, only 2 application threads remain, but they > >> >> all seem blocked on trying to grab a connection from Netty's pool, so > >> >> even if I create new sender threads, it seems they would still > >> >> block. so how can I tell netty to "reset/replenish " its connections? > >> >> > >> >> > >> >> Thanks a lot > >> >> Yang > >> >> > >> >> > >> >> client side: > >> >> > >> >> > >> >> > >> >> WARN 16:51:02,079 Unexpected exception from downstream. |