Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - resource usage of ResultScanner's Iterator<Result>


Copy link to this message
-
Re: resource usage of ResultScanner's Iterator<Result>
Stack 2012-10-26, 19:59
On Thu, Oct 25, 2012 at 1:24 AM, Oliver Meyn (GBIF) <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I'm on cdh3u3 (hbase 0.90.4) and I need to provide a bunch of row keys based on a column value (e.g. give me all keys where column "dataset" = 1234).  That's straightforward using a scan and filter.  The trick is that I want to return an Iterator over my key type (Integer) rather than expose HBase internals (i.e. Result), so I need some kind of Decorator that wraps the Iterator<Result>.  For every call to next() I'd then call the underlying iterator's next() and extract my Integer key from the Result.  That all works fine, but what I'm wondering is what resources the Iterator<Result> is holding, and how I can release those from my decorator.
>
> In my current implementation the decorator's constructor looks like:
>
> public OccurrenceKeyIterator(HTablePool tablePool, String occurrenceTableName, Scan scan)
>
> and the constructor builds the ResultScanner and subsequent iterator.  In my hasNext() method I can check the underlying iterator and if it says false I can shutdown my scanner and return the table to the TablePool. But what if the end-user never reaches the end of the Iterator, or just dereferences it? Am I at risk of leaking tables, connections or anything else?  Any tips on what I should do?
>

If the close is not called, this is what will be missed on the HTable instance:
    flushCommits();
    if (cleanupPoolOnClose) {
      this.pool.shutdown();
    }
    if (cleanupConnectionOnClose) {
      if (this.connection != null) {
        this.connection.close();
      }
    }
    this.closed = true;
In your case, the flushing of commits is of no import.

The pool above is an executor service inside of HTable used doing
batch calls.  Again, you don't really use it but should probably get
cleaned up.

The connection close is good because though all HTables share a
Connection, the above close updates reference counters so we know when
we can let go of the connection.

Keep a list of what you've given out and if unused in N minutes, close
it yourself in background?

Good on you Oliver (when you fellas going to upgrade?)

St.Ack