Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> HBaseClient isn't reusing connections but creating a new one each time


+
Jeff Whiting 2013-03-29, 18:41
+
Ted Yu 2013-03-29, 19:00
+
ramkrishna vasudevan 2013-03-29, 19:40
+
Jeff Whiting 2013-03-29, 19:44
+
Ted Yu 2013-03-29, 19:55
+
Jeff Whiting 2013-03-29, 20:05
Copy link to this message
-
Re: HBaseClient isn't reusing connections but creating a new one each time
Hi Jeff,

Yeah that is pretty bad.  User should definitely be implementing equals()
and hashCode().  Thanks for tracking this down and reporting it.

I opened https://issues.apache.org/jira/browse/HBASE-8222
Gary
On Fri, Mar 29, 2013 at 11:41 AM, Jeff Whiting <[EMAIL PROTECTED]> wrote:

> After noticing a lot of threads, I turned on debugging logging for hbase
> client and saw this many times counting up constantly:
> HBaseClient:531 - IPC Client (687163870) connection to
> /10.1.37.21:60020from jeff: starting, having connections 1364
>
> At that point in my code it was up to 1364 different connections (and
> threads).  Those connections will eventually drop off after the idle time
> is reached "conf.getInt("hbase.ipc.client.connection.maxidletime", 10000)".
> But during periods of activity the number of threads can get very high.
>
> Additionally I was able to confirm the large number of threads by doing:
>
> jstack <pid> | grep IPC
>
>
> So I started digging around in the code...
>
> In HBaseClient.getConnection it attempts to reuse previous connections:
>
>  ConnectionId remoteId = new ConnectionId(addr, protocol, ticket,
> rpcTimeout);
>     do {
>       synchronized (connections) {
>         connection = connections.get(remoteId);
>         if (connection == null) {
>           LOG.error("poolsize: "+getPoolSize(conf));
>           connection = new Connection(remoteId);
>           connections.put(remoteId, connection);
>         }
>       }
>     } while (!connection.addCall(call));
>
>
> It does this by using the connection id as the key to the pool. All of this
> seems good except ConnectionId never hashes to the same value so it cannot
> reuse any connection.
>
> From my understanding of the code here is why.
>
> In HBaseClient.ConnectionId
>
>     @Override
>     public boolean equals(Object obj) {
>      if (obj instanceof ConnectionId) {
>        ConnectionId id = (ConnectionId) obj;
>        return address.equals(id.address) && protocol == id.protocol &&
>               ((ticket != null && ticket.equals(id.ticket)) ||
>                (ticket == id.ticket)) && rpcTimeout == id.rpcTimeout;
>      }
>      return false;
>     }
>
>     @Override  // simply use the default Object#hashcode() ?
>     public int hashCode() {
>       return (address.hashCode() + PRIME * (
>                   PRIME * System.identityHashCode(protocol) ^
>              (ticket == null ? 0 : ticket.hashCode()) )) ^ rpcTimeout;
>     }
>
> It uses the protocol and the ticket in the both functions.  However going
> back through all of the layers I think I found the problem.
>
> Problem:
>
> HBaseRPC.java:  public static VersionedProtocol getProxy(Class<? extends
> VersionedProtocol> protocol,
>       long clientVersion, InetSocketAddress addr, Configuration conf,
>       SocketFactory factory, int rpcTimeout) throws IOException {
>     return getProxy(protocol, clientVersion, addr,
>         User.getCurrent(), conf, factory, rpcTimeout);
>   }
>
> User.getCurrent() always returns a new User object.  That user instance is
> eventually passed down to ConnectionId.  However the User object doesn't
> implement hash() or equals() so one ConnectionId won't ever match another
> ConnectionId.
>
>
> There are several possible solutions.
> 1. implement hashCode and equals for the User.
> 2. only create one User object and reuse it.
> 3. don't look at ticket in ConnectionId (probably a bad idea)
>
>
> Thoughts?  Has anyone else noticed this behavior?  Should I open up a jira
> issue?
>
> I originally ran into the problem due to OS X having a limited number of
> threads per user (and I was not able to increase the limit) and our unit
> tests making requests quick enough that I ran out of threads.  I tried out
> all three solutions and it worked fine for my application.  However I'm not
> sure what changing the behavior would do to other's applications especially
> those that use SecureHadoop.
>
>
>
> Thanks,
> ~Jeff
>
> --
> Jeff Whiting
> Qualtrics Senior Software Engineer
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB