Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> zk connection leak with TableInput/OutputFormat (CDH3b4, 0.90.1)

Copy link to this message
Re: zk connection leak with TableInput/OutputFormat (CDH3b4, 0.90.1)
I did more research and found the issue.

The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader.

The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate.

I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done.

Calling HConnectionManager.deleteAllConnections() is not desirable in my case, as I may have some connections that I do not want deleted.
On Apr 16, 2011, at 3:56 AM, Ted Yu wrote:

> I think you should call this method of HTablePool:
>  public void closeTablePool(final String tableName) throws IOException {
> Actually you only use HTablePool in populateTable(), HTable should be enough
> for you.
> I have logged https://issues.apache.org/jira/browse/HBASE-3791 for ease of
> debugging.
> I think if you place this call:
> HConnectionManager.deleteAllConnections(true);
> on line 52 before calling obj.wait(), situation should be different.
> Cheers
> On Fri, Apr 15, 2011 at 11:56 PM, Bryan Keller <[EMAIL PROTECTED]> wrote:
>> FWIW, I created a test program that demonstrates the issue. The program
>> creates an HBase table, populates it with 10 rows, then runs a simple
>> map-reduce job 10 times in succession, and then goes into a wait state. The
>> test uses gradle so you'll need to download that.
>> Before running, telnet to Zookeeper and type 'stats' to get the
>> connections. Then run the program using 'gradle run'. Finally, telnet to
>> Zookeeper again and type 'stats' to get the connections.
>> I'd be interested to see if others are seeing the same behavior I am.
>> You can download the code here:
>> http://www.vancameron.net/HBaseMR.zip
>> I'll open a JIRA issue after I do a little more research into the problem.
>> On Apr 15, 2011, at 4:19 PM, Ted Yu wrote:
>>> Bryan:
>>> Thanks for reporting this issue.
>>> TableOutputFormat.TableRecordWriter calls the following in close():
>>>     HConnectionManager.deleteAllConnections(true);
>>> But there is no such call in TableInputFormat / TableInputFormatBase /
>>> TableRecordReader
>>> Do you mind filing a JIRA ?
>>> On Fri, Apr 15, 2011 at 3:41 PM, Bryan Keller <[EMAIL PROTECTED]> wrote:
>>>> I am having this same problem. After every run of my map-reduce job
>> which
>>>> uses TableInputFormat, I am leaking one ZK connection. The connections
>> that
>>>> are not being cleaned up are connected to the node that submitted the
>> job,
>>>> not the cluster nodes.
>>>> I tried explicitly cleaning up the connection using
>>>> HConnectionManager.deleteConnection(config, true) after the job runs,
>> but
>>>> this has no effect. ZK still retains one connection per job run and
>> never
>>>> releases it. Eventually I run out of ZK connections even if I set
>> maxCnxns
>>>> very high (e.g. 600).