Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - ROW ID Iterator - sanity check


Copy link to this message
-
Re: ROW ID Iterator - sanity check
Adam Fuchs 2012-05-20, 17:57
Since you changed the iterator method to create a new RowIdIterator based
on the old scanner, and the old scanner remembers its scan iterator
configuration, each time you call iterate you end up duplicating the call
to setScanIterator. I would instead do that configuration of the scanner
outside of the RowIdIterator before you construct the first one.

SortedKeyValueIterator is the basic interface that we use for server-side
iterator implementation. Every iterator that operates in the "iterator
tree" is a SortedKeyValueIterator. Bill was saying that you could write
your own iterator and add it to that iterator tree to take advantage of the
extra functionality that exists on the server side.

If you were to write a SortedKeyValueIterator, you would probably start out
with a WrappingIterator and override the next() and seek() methods so that
they can skip way ahead when you ask for the next row. Here's what that
would look like:
import java.util.Collection;

import org.apache.accumulo.core.data.ByteSequence;
import org.apache.accumulo.core.data.Key;
import org.apache.accumulo.core.data.PartialKey;
import org.apache.accumulo.core.data.Range;
import org.apache.accumulo.core.data.Value;
import org.apache.accumulo.core.iterators.WrappingIterator;

public class RowEnumerationIterator extends WrappingIterator {

  boolean notFinished = false;
  Range originalRange;
  Collection<ByteSequence> originalColumns;
  boolean originalColumnsInclusive;

  @Override
  public void seek(Range r, Collection<ByteSequence> columns, boolean
columnsInclusive)
  {
    notFinished = true;
    // keep track of the original seek parameters so that we can reference
them when we reseek later
    originalRange = r;
    originalColumns = columns;
    originalColumnsInclusive = columnsInclusive;
    super.seek(r, columns, columnsInclusive);
  }

  @Override
  public boolean hasTop()
  {
    // check our local state first, then defer to the super class
    return notFinished && super.hasTop();
  }

  @Override
  public void next()
  {
    // create a range starting at the next possible row and continuing to
infinity
    Range followingRange = new
Range(getTopKey().followingKey(PartialKey.ROW),(Key)null);
    // intersect that new range with the original range given to our seek
method
    Range intersectedRange = originalRange.clip(followingRange, true);
    // check to see if we're past the end of the original range
    if(intersectedRange == null)
      notFinished = false;
    else
      getSource().seek(intersectedRange, originalColumns,
originalColumnsInclusive);
  }

  Value emptyValue = new Value(new byte[0]);
  @Override
  public Value getTopValue()
  {
    // replace the value with an empty value to save bandwidth
    return emptyValue;
  }
}

You'll need to add this class to the dynamic classpath (i.e. put it in a
jar in the lib/ext directory of all the tablet servers), and then reference
it like you did the SortedKeyIterator below.

Cheers,
Adam
On Sun, May 20, 2012 at 12:49 PM, David Medinets
<[EMAIL PROTECTED]>wrote:

> Seaching through the source for SortedKeyIterator shows that it is
> used in 15 files. The FindMax class seems to be a fine example of its
> use:
>
>    IteratorSetting cfg = new IteratorSetting(Integer.MAX_VALUE,
> SortedKeyIterator.class);
>    scanner.addScanIterator(cfg);
>
> That seems simple enough but when I change my code according I get a
> message:
>
>  Exception in thread "main" java.lang.IllegalArgumentException:
> Iterator name is already in use SKI98
>        at
> org.apache.accumulo.core.client.impl.ScannerOptions.addScanIterator(ScannerOptions.java:67)
>        at com.codebits.accumulo.RowIdIterator.<init>(RowIdIterator.java:22)
>
> My code change was trivial:
>
>        Iterator<Entry<Key, Value>> iterator = null;
>
>        public RowIdIterator(Scanner scanner) {
>                super();
>                this.scanner = scanner;
>             IteratorSetting cfg = new IteratorSetting(Integer.MAX_VALUE,
> "SKI98", SortedKeyIterator.class);