Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values


Copy link to this message
-
Re: Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values
Hi Mike!

The Combiner interface is only for aggregating keys within a single row.
You can probably get away with implementing your combining logic in a
WrappingIterator that reads across all the rows in a given tablet.

To do some combine/fold/reduce operation, Accumulo needs the input type to
be the same as the output type. The combiner doesn't have a notion of a
"present" type (as you'd see in something like Algebird's Groups), but you
can use another iterator to perform your transformation.

If you wanted to extract the "count" field from your Avro object, you could
write a new Iterator that took your Avro object, extracted the desired
field, and returned it as its top value. You can then set this iterator as
the source of the aggregator, either programmatically or via by wrapping
the source object passed to the aggregator in its
SortedKeyValueIterator#init call.

This is a bit inefficient as you'd have to serialize to a Value and then
immediately deserialize it in the iterator above it. You could mitigate
this by exposing a method that would get the extracted value before
serializing it.

This kind of counting also requires client side logic to do a final combine
operation, since the aggregations from all the tservers are partial results.

I believe that CountingIterator is not meant for user consumption, but I do
not know if it's related to your issue in trying to use it from the shell.
Iterators set through the shell, in previous versions of Accumulo, have a
requirement to implement OptionDescriber. Many default iterators do not
implement this, and thus can't set in the shell.

On Mon, Jul 14, 2014 at 2:44 PM, Michael Moss <[EMAIL PROTECTED]>
wrote: