Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors


Copy link to this message
-
Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors
Ashutosh Chauhan 2013-08-29, 18:53
Thanks Agatea for digging in. Seems like you have hit a bug. Would you mind
opening a jira and adding your findings to it.

Thanks,
Ashutosh
On Thu, Aug 29, 2013 at 11:22 AM, agateaaa <[EMAIL PROTECTED]> wrote:

> Sorry hit send too soon ...
>
> Hi All:
>
> Put some debugging code in TUGIContainingTransport.getTransport() and I
> tracked it down to
>
> @Override
> public TUGIContainingTransport getTransport(TTransport trans) {
>
> // UGI information is not available at connection setup time, it will be
> set later
> // via set_ugi() rpc.
> transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));
>
> //return transMap.get(trans); //<-change
>           TUGIContainingTransport retTrans = transMap.get(trans);
>
>           if ( retTrans == null ) {
>              LOGGER.error (" cannot find transport that was in map !!")
>            }  else {
>              LOGGER.debug (" cannot find transport that was in map !!")
>              return retTrans;
>        }
> }
>
> When we run this in our test environment, see that we run into the problem
> just after GC runs,
> and "cannot find transport that was in the map!!" message gets logged.
>
> Could the GC be collecting entries from transMap, just before the we get it
>
> Tried a minor change which seems to work
>
> public TUGIContainingTransport getTransport(TTransport trans) {
>
>    TUGIContainingTransport retTrans = transMap.get(trans);
>
>     if ( retTrans == null ) {
> // UGI information is not available at connection setup time, it will be
> set later
> // via set_ugi() rpc.
> transMap.putIfAbsent(trans, retTrans);
>     }
>    return retTrans;
> }
>
>
> My questions for hive and  thrift experts
>
> 1.) Do we need to use a ConcurrentMap
> ConcurrentMap<TTransport, TUGIContainingTransport> transMap = new
> MapMaker().weakKeys().weakValues().makeMap();
> It does use == to compare keys (which might be the problem), also in this
> case we cant rely on the trans to be always there in the transMap, even
> after a put, so in that case change above
> probably makes sense
>
>
> 2.) Is it better idea to use WeakHashMap with WeakReference instead ? (was
> looking at org.apache.thrift.transport.TSaslServerTransport, esp change
> made by THRIFT-1468)
>
> e.g.
> private static Map<TTransport, WeakReference<TUGIContainingTransport>>
> transMap3 = Collections.synchronizedMap(new WeakHashMap<TTransport,
> WeakReference<TUGIContainingTransport>>());
>
> getTransport() would be something like
>
> public TUGIContainingTransport getTransport(TTransport trans) {
> WeakReference<TUGIContainingTransport> ret = transMap.get(trans);
> if (ret == null || ret.get() == null) {
> ret = new WeakReference<TUGIContainingTransport>(new
> TUGIContainingTransport(trans));
> transMap3.put(trans, ret); // No need for putIfAbsent().
> // Concurrent calls to getTransport() will pass in different TTransports.
> }
> return ret.get();
> }
>
>
> I did try 1.) above in our test environment and it does seem to resolve the
> problem, though i am not sure if I am introducing any other problem
>
>
> Can someone help ?
>
>
> Thanks
> Agatea
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Thu, Aug 29, 2013 at 10:57 AM, agateaaa <[EMAIL PROTECTED]> wrote:
>
> > Hi All:
> >
> > Put some debugging code in TUGIContainingTransport.getTransport() and I
> > tracked it down to
> >
> > @Override
> > public TUGIContainingTransport getTransport(TTransport trans) {
> >
> > // UGI information is not available at connection setup time, it will be
> > set later
> > // via set_ugi() rpc.
> > transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));
> >
> > //return transMap.get(trans); <-change
> >           TUGIContainingTransport retTrans = transMap.get(trans);
> >
> >           if ( retTrans == null ) {
> >
> >
> >
> > }
> >
> >
> >
> >
> >
> > On Wed, Jul 31, 2013 at 9:48 AM, agateaaa <[EMAIL PROTECTED]> wrote:
> >
> >> Thanks Nitin
> >>
> >> There arent too many connections in close_wait state only 1 or two when
> >> we run into this. Most likely its because of dropped connection.