|
|
-
Re: ROW ID Iterator - sanity check
Adam Fuchs 2012-05-20, 17:57
Since you changed the iterator method to create a new RowIdIterator based on the old scanner, and the old scanner remembers its scan iterator configuration, each time you call iterate you end up duplicating the call to setScanIterator. I would instead do that configuration of the scanner outside of the RowIdIterator before you construct the first one.
SortedKeyValueIterator is the basic interface that we use for server-side iterator implementation. Every iterator that operates in the "iterator tree" is a SortedKeyValueIterator. Bill was saying that you could write your own iterator and add it to that iterator tree to take advantage of the extra functionality that exists on the server side.
If you were to write a SortedKeyValueIterator, you would probably start out with a WrappingIterator and override the next() and seek() methods so that they can skip way ahead when you ask for the next row. Here's what that would look like: import java.util.Collection;
import org.apache.accumulo.core.data.ByteSequence; import org.apache.accumulo.core.data.Key; import org.apache.accumulo.core.data.PartialKey; import org.apache.accumulo.core.data.Range; import org.apache.accumulo.core.data.Value; import org.apache.accumulo.core.iterators.WrappingIterator;
public class RowEnumerationIterator extends WrappingIterator {
boolean notFinished = false; Range originalRange; Collection<ByteSequence> originalColumns; boolean originalColumnsInclusive;
@Override public void seek(Range r, Collection<ByteSequence> columns, boolean columnsInclusive) { notFinished = true; // keep track of the original seek parameters so that we can reference them when we reseek later originalRange = r; originalColumns = columns; originalColumnsInclusive = columnsInclusive; super.seek(r, columns, columnsInclusive); }
@Override public boolean hasTop() { // check our local state first, then defer to the super class return notFinished && super.hasTop(); }
@Override public void next() { // create a range starting at the next possible row and continuing to infinity Range followingRange = new Range(getTopKey().followingKey(PartialKey.ROW),(Key)null); // intersect that new range with the original range given to our seek method Range intersectedRange = originalRange.clip(followingRange, true); // check to see if we're past the end of the original range if(intersectedRange == null) notFinished = false; else getSource().seek(intersectedRange, originalColumns, originalColumnsInclusive); }
Value emptyValue = new Value(new byte[0]); @Override public Value getTopValue() { // replace the value with an empty value to save bandwidth return emptyValue; } }
You'll need to add this class to the dynamic classpath (i.e. put it in a jar in the lib/ext directory of all the tablet servers), and then reference it like you did the SortedKeyIterator below.
Cheers, Adam On Sun, May 20, 2012 at 12:49 PM, David Medinets <[EMAIL PROTECTED]>wrote:
> Seaching through the source for SortedKeyIterator shows that it is > used in 15 files. The FindMax class seems to be a fine example of its > use: > > IteratorSetting cfg = new IteratorSetting(Integer.MAX_VALUE, > SortedKeyIterator.class); > scanner.addScanIterator(cfg); > > That seems simple enough but when I change my code according I get a > message: > > Exception in thread "main" java.lang.IllegalArgumentException: > Iterator name is already in use SKI98 > at > org.apache.accumulo.core.client.impl.ScannerOptions.addScanIterator(ScannerOptions.java:67) > at com.codebits.accumulo.RowIdIterator.<init>(RowIdIterator.java:22) > > My code change was trivial: > > Iterator<Entry<Key, Value>> iterator = null; > > public RowIdIterator(Scanner scanner) { > super(); > this.scanner = scanner; > IteratorSetting cfg = new IteratorSetting(Integer.MAX_VALUE, > "SKI98", SortedKeyIterator.class);
+
Adam Fuchs 2012-05-20, 17:57
-
ROW ID Iterator - sanity check
David Medinets 2012-05-19, 22:09
I wanted a program to display Row Id values in the simplest way possible. Please let me know if I have overlooked something. First, i wrapped the RowIterator like this;
package com.codebits.accumulo;
import java.util.Iterator; import java.util.Map.Entry;
import org.apache.accumulo.core.client.RowIterator; import org.apache.accumulo.core.client.Scanner; import org.apache.accumulo.core.data.Key; import org.apache.accumulo.core.data.Value;
public class RowIdIterator implements Iterator<String>, Iterable<String> {
Scanner scanner = null; RowIterator iterator = null;
public RowIdIterator(Scanner scanner) { super(); this.scanner = scanner; this.iterator = new RowIterator(scanner); }
@Override public boolean hasNext() { return iterator.hasNext(); }
@Override public String next() { Iterator<Entry<Key, Value>> entry = iterator.next(); return entry.next().getKey().getRow().toString(); }
@Override public void remove() { }
@Override public Iterator<String> iterator() { return this; } }
And then I used a driver program like this;
package com.codebits.accumulo;
import org.apache.accumulo.core.client.AccumuloException; import org.apache.accumulo.core.client.AccumuloSecurityException; import org.apache.accumulo.core.client.Connector; import org.apache.accumulo.core.client.Scanner; import org.apache.accumulo.core.client.TableNotFoundException; import org.apache.accumulo.core.client.ZooKeeperInstance; import org.apache.accumulo.core.security.Authorizations;
public class RowIdInteratorDriver {
public static void main(String[] args) throws AccumuloException, AccumuloSecurityException, TableNotFoundException { String instanceName = "development"; String zooKeepers = "localhost"; String user = "root"; byte[] pass = "password".getBytes(); String tableName = "test_row_iterator"; Authorizations authorizations = new Authorizations();
ZooKeeperInstance instance = new ZooKeeperInstance(instanceName, zooKeepers); Connector connector = instance.getConnector(user, pass); Scanner scanner = connector.createScanner(tableName, authorizations);
for (String rowId : new RowIdIterator(scanner)) { System.out.println("ROW ID: " + rowId); } }
}
This code works:
ROW ID: R001 ROW ID: R002 ROW ID: R003
My concern is that scanner that I am passing into the iterator. How is that testable? And, of course, the class name is confusing..
+
David Medinets 2012-05-19, 22:09
-
Re: ROW ID Iterator - sanity check
Adam Fuchs 2012-05-19, 23:49
One issue here is you are mixing Iterator and Iterable in the same object. Usually, an Iterable will return an iterator at the beginning of some logical sequence, but your iterable returns the same iterator object over and over again. This state sharing would make it so that you can really only iterate over the iterable once. In your iterator() method you might instead "return new RowIdIterator(scanner)", and that would properly separate the state of the different iterators.
To test this, you could construct a unit test that starts with a MockInstance, adds some data, then checks to see that the row ids come out as expected with code similar to your main method.
We can also talk about how to make this more efficient with an iterator if you like.
Cheers, Adam On Sat, May 19, 2012 at 6:10 PM, David Medinets <[EMAIL PROTECTED]>wrote:
> I wanted a program to display Row Id values in the simplest way > possible. Please let me know if I have overlooked something. First, i > wrapped the RowIterator like this; > > package com.codebits.accumulo; > > import java.util.Iterator; > import java.util.Map.Entry; > > import org.apache.accumulo.core.client.RowIterator; > import org.apache.accumulo.core.client.Scanner; > import org.apache.accumulo.core.data.Key; > import org.apache.accumulo.core.data.Value; > > public class RowIdIterator implements Iterator<String>, Iterable<String> { > > Scanner scanner = null; > RowIterator iterator = null; > > public RowIdIterator(Scanner scanner) { > super(); > this.scanner = scanner; > this.iterator = new RowIterator(scanner); > } > > @Override > public boolean hasNext() { > return iterator.hasNext(); > } > > @Override > public String next() { > Iterator<Entry<Key, Value>> entry = iterator.next(); > return entry.next().getKey().getRow().toString(); > } > > @Override > public void remove() { > } > > @Override > public Iterator<String> iterator() { > return this; > } > } > > And then I used a driver program like this; > > package com.codebits.accumulo; > > import org.apache.accumulo.core.client.AccumuloException; > import org.apache.accumulo.core.client.AccumuloSecurityException; > import org.apache.accumulo.core.client.Connector; > import org.apache.accumulo.core.client.Scanner; > import org.apache.accumulo.core.client.TableNotFoundException; > import org.apache.accumulo.core.client.ZooKeeperInstance; > import org.apache.accumulo.core.security.Authorizations; > > public class RowIdInteratorDriver { > > public static void main(String[] args) throws AccumuloException, > AccumuloSecurityException, TableNotFoundException { > String instanceName = "development"; > String zooKeepers = "localhost"; > String user = "root"; > byte[] pass = "password".getBytes(); > String tableName = "test_row_iterator"; > Authorizations authorizations = new Authorizations(); > > ZooKeeperInstance instance = new > ZooKeeperInstance(instanceName, zooKeepers); > Connector connector = instance.getConnector(user, pass); > Scanner scanner = connector.createScanner(tableName, > authorizations); > > for (String rowId : new RowIdIterator(scanner)) { > System.out.println("ROW ID: " + rowId); > } > } > > } > > This code works: > > ROW ID: R001 > ROW ID: R002 > ROW ID: R003 > > My concern is that scanner that I am passing into the iterator. How is > that testable? And, of course, the class name is confusing.. >
+
Adam Fuchs 2012-05-19, 23:49
-
Re: ROW ID Iterator - sanity check
David Medinets 2012-05-20, 04:03
> We can also talk about how to make this more efficient with an iterator if you like.
I would. How can it be more efficient?
On Sat, May 19, 2012 at 7:49 PM, Adam Fuchs <[EMAIL PROTECTED]> wrote: > One issue here is you are mixing Iterator and Iterable in the same object.
Good point. I've fixed that. And also looked at MockInstance. I've changed my code to the following which does seem to work. I have some code handle Column Family iterating but I want to hear about efficiency before I post that code.
public class IteratorTestDriver {
public static void main(String[] args) throws IOException, AccumuloException, AccumuloSecurityException, TableExistsException, TableNotFoundException { Instance mock = new MockInstance("development"); Connector connector = mock.getConnector("root", "password".getBytes()); connector.tableOperations().create("TABLEA");
BatchWriter wr = connector.createBatchWriter("TABLEA", 10000000, 10000, 5); for(int i = 0; i < 1000; ++i) { Mutation m = new Mutation("row_"+i); m.put("cf_"+i, "cq_"+1, "val_"+1); wr.addMutation(m); } wr.close(); Scanner scanner = connector.createScanner("TABLEA", new Authorizations());
for (String rowId : new RowIdIterator(scanner)) { System.out.println("ROW ID: " + rowId); } } }
+
David Medinets 2012-05-20, 04:03
-
Re: ROW ID Iterator - sanity check
William Slacum 2012-05-20, 04:18
A SortedKeyValue implementation would allow you to skip across rows server side, potentially saving you lots of reads and network traffic.
On Sun, May 20, 2012 at 12:03 AM, David Medinets <[EMAIL PROTECTED]> wrote: >> We can also talk about how to make this more efficient with an iterator if you like. > > I would. How can it be more efficient? > > On Sat, May 19, 2012 at 7:49 PM, Adam Fuchs <[EMAIL PROTECTED]> wrote: >> One issue here is you are mixing Iterator and Iterable in the same object. > > Good point. I've fixed that. And also looked at MockInstance. I've > changed my code to the following which does seem to work. I have some > code handle Column Family iterating but I want to hear about > efficiency before I post that code. > > public class IteratorTestDriver { > > public static void main(String[] args) throws IOException, > AccumuloException, AccumuloSecurityException, TableExistsException, > TableNotFoundException { > Instance mock = new MockInstance("development"); > Connector connector = mock.getConnector("root", "password".getBytes()); > connector.tableOperations().create("TABLEA"); > > BatchWriter wr = connector.createBatchWriter("TABLEA", 10000000, 10000, 5); > for(int i = 0; i < 1000; ++i) { > Mutation m = new Mutation("row_"+i); > m.put("cf_"+i, "cq_"+1, "val_"+1); > wr.addMutation(m); > } > wr.close(); > > Scanner scanner = connector.createScanner("TABLEA", new Authorizations()); > > for (String rowId : new RowIdIterator(scanner)) { > System.out.println("ROW ID: " + rowId); > } > > } > }
+
William Slacum 2012-05-20, 04:18
-
Re: ROW ID Iterator - sanity check
William Slacum 2012-05-20, 04:19
Excuse me, I meant a SortedKeyValueIterator implementation :)
On Sun, May 20, 2012 at 12:18 AM, William Slacum <[EMAIL PROTECTED]> wrote: > A SortedKeyValue implementation would allow you to skip across rows > server side, potentially saving you lots of reads and network traffic. > > On Sun, May 20, 2012 at 12:03 AM, David Medinets > <[EMAIL PROTECTED]> wrote: >>> We can also talk about how to make this more efficient with an iterator if you like. >> >> I would. How can it be more efficient? >> >> On Sat, May 19, 2012 at 7:49 PM, Adam Fuchs <[EMAIL PROTECTED]> wrote: >>> One issue here is you are mixing Iterator and Iterable in the same object. >> >> Good point. I've fixed that. And also looked at MockInstance. I've >> changed my code to the following which does seem to work. I have some >> code handle Column Family iterating but I want to hear about >> efficiency before I post that code. >> >> public class IteratorTestDriver { >> >> public static void main(String[] args) throws IOException, >> AccumuloException, AccumuloSecurityException, TableExistsException, >> TableNotFoundException { >> Instance mock = new MockInstance("development"); >> Connector connector = mock.getConnector("root", "password".getBytes()); >> connector.tableOperations().create("TABLEA"); >> >> BatchWriter wr = connector.createBatchWriter("TABLEA", 10000000, 10000, 5); >> for(int i = 0; i < 1000; ++i) { >> Mutation m = new Mutation("row_"+i); >> m.put("cf_"+i, "cq_"+1, "val_"+1); >> wr.addMutation(m); >> } >> wr.close(); >> >> Scanner scanner = connector.createScanner("TABLEA", new Authorizations()); >> >> for (String rowId : new RowIdIterator(scanner)) { >> System.out.println("ROW ID: " + rowId); >> } >> >> } >> }
+
William Slacum 2012-05-20, 04:19
-
Re: ROW ID Iterator - sanity check
David Medinets 2012-05-20, 16:48
Seaching through the source for SortedKeyIterator shows that it is used in 15 files. The FindMax class seems to be a fine example of its use:
IteratorSetting cfg = new IteratorSetting(Integer.MAX_VALUE, SortedKeyIterator.class); scanner.addScanIterator(cfg);
That seems simple enough but when I change my code according I get a message:
Exception in thread "main" java.lang.IllegalArgumentException: Iterator name is already in use SKI98 at org.apache.accumulo.core.client.impl.ScannerOptions.addScanIterator(ScannerOptions.java:67) at com.codebits.accumulo.RowIdIterator.<init>(RowIdIterator.java:22)
My code change was trivial:
Iterator<Entry<Key, Value>> iterator = null;
public RowIdIterator(Scanner scanner) { super(); this.scanner = scanner; IteratorSetting cfg = new IteratorSetting(Integer.MAX_VALUE, "SKI98", SortedKeyIterator.class); 22 --> scanner.addScanIterator(cfg); this.iterator = scanner.iterator(); }
@Override public String next() { Entry<Key, Value> entry = iterator.next(); return entry.getKey().getRow().toString(); }
As you can see its name is unlikely to be in use.
+
David Medinets 2012-05-20, 16:48
-
Re: ROW ID Iterator - sanity check
Billie J Rinaldi 2012-05-20, 17:56
Are you calling new RowIdIterator(scanner) multiple times for the same scanner?
The SortedKeyIterator is a step in the right direction because it ignores Values. You could improve things further with a custom iterator that does a seek to the beginning of the next row whenever you call next().
Billie On Sunday, May 20, 2012 12:48:25 PM, "David Medinets" <[EMAIL PROTECTED]> wrote: > Seaching through the source for SortedKeyIterator shows that it is > used in 15 files. The FindMax class seems to be a fine example of its > use: > > IteratorSetting cfg = new IteratorSetting(Integer.MAX_VALUE, > SortedKeyIterator.class); > scanner.addScanIterator(cfg); > > That seems simple enough but when I change my code according I get a > message: > > Exception in thread "main" java.lang.IllegalArgumentException: > Iterator name is already in use SKI98 > at > org.apache.accumulo.core.client.impl.ScannerOptions.addScanIterator(ScannerOptions.java:67) > at com.codebits.accumulo.RowIdIterator.<init>(RowIdIterator.java:22) > > My code change was trivial: > > Iterator<Entry<Key, Value>> iterator = null; > > public RowIdIterator(Scanner scanner) { > super(); > this.scanner = scanner; > IteratorSetting cfg = new IteratorSetting(Integer.MAX_VALUE, > "SKI98", SortedKeyIterator.class); > 22 --> scanner.addScanIterator(cfg); > this.iterator = scanner.iterator(); > } > > @Override > public String next() { > Entry<Key, Value> entry = iterator.next(); > return entry.getKey().getRow().toString(); > } > > As you can see its name is unlikely to be in use.
+
Billie J Rinaldi 2012-05-20, 17:56
|
|