Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Netty/Avro IPC problem: channel closed


Copy link to this message
-
Re: Netty/Avro IPC problem: channel closed
James Baldassari 2011-09-14, 19:49
Glad it's working so far.  If you do see any issues, please let us know
and/or file a JIRA.

By the way, since you're using Avro 1.5.2 with Netty you can now take
advantage of asynchronous RPCs if this suits your use case.  I have some
sample code out here: https://github.com/jbaldassari/Avro-RPC

-James
On Wed, Sep 14, 2011 at 3:06 PM, Yang <[EMAIL PROTECTED]> wrote:

> yeah, I found i was actually using 1.5.1...
> updated to 1.5.2 , now it works fine so far after 1 hour
>
> Thanks a lot!
> Yang
>
> On Wed, Sep 14, 2011 at 10:56 AM, James Baldassari
> <[EMAIL PROTECTED]> wrote:
> > It appears to be pre-1.5.2 from this part of the stack trace:
> >
> >        at java.util.concurrent.Semaphore.acquire(Semaphore.java:313)
> >        at
> >
> org.apache.avro.ipc.NettyTransceiver$CallFuture.get(NettyTransceiver.java:203)
> >
> > CallFuture was moved out of NettyTransceiver as part of AVRO-539 and is
> now
> > a stand-alone class.  Also the Semaphore inside CallFuture was replaced
> with
> > a CountDownLatch, so in 1.5.2 and later we should never see CallFuture
> > waiting on a Semaphore.
> >
> > From your initial description it appears that some temporary network
> > disruption might have caused the connection between the client and server
> to
> > close, and then the client never recovered from this situation.  This
> > doesn't surprise me because I don't think the pre-1.5.2 NettyTransceiver
> had
> > any way to recover from a connection failure.  While working on AVRO-539
> I
> > modified the transceiver code such that it would attempt to re-establish
> the
> > connection if the connection was lost, so that's why I think this may
> help
> > you.  Just a guess though.  But like I said, since the code has changed
> so
> > much in 1.5.2 and later, it will be much easier to figure out what's
> wrong
> > (and fix it if necessary) if you can reproduce it using 1.5.2 or later.
> >
> > -James
> >
> >
> > On Wed, Sep 14, 2011 at 1:39 PM, Yang <[EMAIL PROTECTED]> wrote:
> >>
> >> thanks James:
> >>
> >> I *think* I'm using 1.5.2, but could check to be sure.
> >> how do you determine that it is a pre-1.5.2 version?
> >>
> >> Yang
> >>
> >> On Wed, Sep 14, 2011 at 10:25 AM, James Baldassari
> >> <[EMAIL PROTECTED]> wrote:
> >> > Hi Yang,
> >> >
> >> > From the stack trace you posted it appears that you are using a
> version
> >> > of
> >> > Avro prior to 1.5.2.  Which version are you using?  There have been a
> >> > number
> >> > of significant changes recently to the RPC framework and the Netty
> >> > implementation in particular.  Could you please try to reproduce the
> >> > problem
> >> > using Avro 1.5.2 or newer?  The problem may be resolved with an
> >> > upgrade.  If
> >> > the problem still exists in the newer versions, it will be a lot
> easier
> >> > to
> >> > diagnose/fix it if we can see stack traces from a post-1.5.2 version.
> >> >
> >> > Thanks,
> >> > James
> >> >
> >> >
> >> > On Wed, Sep 14, 2011 at 1:08 PM, Yang <[EMAIL PROTECTED]> wrote:
> >> >>
> >> >> I'm always seeing these "channel closed " exceptions , with low
> >> >> probability, i.e. about every 10 hours under heavy load.
> >> >>
> >> >> I'm not sure if it's the server that got the channel closed or the
> >> >> client, so I included the exception stack from both sides.
> >> >> anybody has an idea how to debug this?
> >> >>
> >> >> also, let's say it does have a valid reason for closing this, what is
> >> >> my strategy of coping with this? I originally have many
> >> >> senders, due to the channel close exception, many of them died, after
> >> >> this, only 2 application threads remain, but they
> >> >> all seem blocked on trying to grab a connection from Netty's pool, so
> >> >> even if I create new sender threads, it seems they would still
> >> >> block. so how can I tell netty to "reset/replenish " its connections?
> >> >>
> >> >>
> >> >> Thanks a lot
> >> >> Yang
> >> >>
> >> >>
> >> >> client side:
> >> >>
> >> >>
> >> >>
> >> >>  WARN 16:51:02,079 Unexpected exception from downstream.