Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Filtering rows by presence of keys


Copy link to this message
-
Re: Filtering rows by presence of keys
One of the differences you'll see between WholeRowIterator and RowFilter is
that WholeRowIterator buffers an entire row in memory while RowFilter does
not. Each includes a boolean method that you would override in a subclass
-- acceptRow(...) in RowFilter or filter(...) in WholeRowIterator. In this
case, I think the acceptRow(...) method would be easier for you to
implement, it might be more efficient, and you wouldn't have to worry about
buffering too much in memory. Here's how I would write it:

public class AwesomeIterator extends RowFilter {
  ...
  public boolean acceptRow(SortedKeyValueIterator<Key,Value> rowIterator)
throws IOException
  {
    // the seek will get "clipped" to the row in question, so we can use an
infinite
    //   range and look for anything in the "ACTIVE" column family
    rowIterator.seek(new Range(),Collections.singleton((ByteSequence)new
ArrayByteSequence("ACTIVE")),true);
    return rowIterator.hasTop();
  }
}
Cheers,
Adam
On Tue, May 22, 2012 at 12:56 PM, John Armstrong <[EMAIL PROTECTED]> wrote:

> On 05/22/2012 12:46 PM, [EMAIL PROTECTED] wrote:
>
>> IntersectingIterator is designed to reduce a dataset to a common column
>> qualifier for a collection of column families.  So I presume you mental
>> picture (like mine was for a long time) inverted to the logic of that
>> iterator.  You might try another type...like RowFilter.
>>
>
> Adding a filter to the WholeRowIterator has been suggested, and I'm trying
> that.  I'm also pushing for an upgrade from 1.3.4 to 1.4.x, but that may be
> harder going.
>