Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Combiner Iterator


Copy link to this message
-
Re: Combiner Iterator
Devin,

The Iterator<Value> is the "magic" that Accumulo is doing underneath the
hood. The Combiner is a neat construct because it performs this reduction
server-side so that your client code doesn't need to.

When you use a Combiner with a (Batch)Scanner or configured on a table, you
specify what granularity of the Key you want that Combiner to combine
(row+colfam or row+colfam+colqual) and then Accumulo gives you all of the
Values that are grouped together beneath that collection of Keys.

Concretely, say you had a table with the following:

A cf1:cq1 []    1
A cf1:cq1 []    2
A cf1:cq2 []    3
A cf1:cq2 []    4
A cf1:cq3 []    5
A cf1:cq3 []    6

Setting a SummingCombiner over cf1, you would see in the Combiner
Iterator<Value> from [1,2,3,4,5,6]. If you set the Combiner with no column
definition, you would see Iterator<Value>'s over [1,2], [3,4], and then
[5,6].

You can also find some more info at
http://accumulo.apache.org/1.5/accumulo_user_manual.html#_combiners
On Wed, Sep 18, 2013 at 9:23 AM, Devin Pinkston <[EMAIL PROTECTED]>wrote:

> Hello,
>
> I am trying to work with the example combiner iterator through java code
> instead of the jar or shell.  My question is how do I pass in the
> Iterator<Value> to the reduce method?  Usually I would create a Key
> Value Iterator, but this requires an Iterator just over the Value, and
> then the key to be passed in separately.  The reducer method comes from
> the StatsCombiner class under examples/simple/combiner.  What I have
> right now:
>
>  Iterator<Map.Entry<Key, Value>> iterator = scan.iterator();
>         Iterator<Value> iter;
>
>
>         while (iterator.hasNext()) {
>             Map.Entry<Key, Value> entry = iterator.next();
>             iter = iterator.next().getValue();
>             Key key = entry.getKey();
>             Value value = entry.getValue();
>             reduce(key, iter);
>         }
>
> How would I create the Iterator<Value>?  Every way I have tried has
> led me to dead end or error at runtime.
>
> Thanks!
>
> Here is the reduce method from StatsCombiner.java:
>
>  @Override
>   public Value reduce(Key key, Iterator<Value> iter) {
>
>     long min = Long.MAX_VALUE;
>     long max = Long.MIN_VALUE;
>     long sum = 0;
>     long count = 0;
>
>     while (iter.hasNext()) {
>       String stats[] = iter.next().toString().split(",");
>
>       if (stats.length == 1) {
>         long val = Long.parseLong(stats[0], radix);
>         min = Math.min(val, min);
>         max = Math.max(val, max);
>         sum += val;
>         count += 1;
>       } else {
>         min = Math.min(Long.parseLong(stats[0], radix), min);
>         max = Math.max(Long.parseLong(stats[1], radix), max);
>         sum += Long.parseLong(stats[2], radix);
>         count += Long.parseLong(stats[3], radix);
>       }
>     }
>
>     String ret = Long.toString(min, radix) + "," + Long.toString(max,
> radix) + "," + Long.toString(sum, radix) + "," + Long.toString(count,
> radix);
>     return new Value(ret.getBytes());
>   }
>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB