Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values


Copy link to this message
-
Re: Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values
Ah, an artifact of me just willy nilly writing an iterator :) Any reference
to `this.source` should be replaced with `this.getSource()`. In `next()`,
your workaround ends up calling `this.hasTop()` as the while loop
condition. It will always return false because two lines up we set
`top_key` to null. We need to make sure that the source iterator has a top,
because we want to read data from it. We'll have to change the loop
condition to `while(this.getSource().hasTop())`. On line 38 of your code
we'll need to call `this.getSource().next()` instead of `this.next()`.

The iterator interface is documented, but there hasn't been a definitive
go-to for making one. I've been drafting a blog post, but since it doesn't
exist yet, hopefully the following will suffice.

The lifetime of an iterator is (usually) as follows:

(1) A new instance is called via Class.newInstance (so a no-args
constructor is needed)
(2) Init is called. This allows users to configure the iterator, set its
source, and possible check the environment. We can also call `deepCopy` on
the source if we want to have multiple sources (we'd do this if we wanted
to do a merge read out of multiple column families within a row).
(3) seek() is called. This gets our readers to the correct positions in the
data that are within the scan range the user requested, as well as turning
column families on or off. The name should reminiscent of seeking to some
key on disk.
(4) hasTop() is called. If true, that means we have data, and the iterator
has a key/value pair that can be retrieved by calling getTopKey() and
getTopValue(). If fasle, we're done because there's no data to return.
(5) next() is called. This will attempt find a new top key and value. We go
back to (4) to see if next was successful in finding a new top key/value
and will repeat until the client is satisfied or hasTop() returns false.

You can kind of make a state machine out of those steps where we loop
between (4) and (5) until there's no data. There are more advanced
workflows where next() can be reading from multiple sources, as well as
seeking them to different positions in the tablet.
On Mon, Jul 14, 2014 at 4:51 PM, Michael Moss <[EMAIL PROTECTED]>
wrote:
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB