Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Is it possible to use an iterator to aggregate results of a BatchScanner?


Copy link to this message
-
Re: Is it possible to use an iterator to aggregate results of a BatchScanner?
So, is a global sorting order required of your iterator? That's really
the key behavioral difference in terms of output when you're dealing
with a Scanner versus a BatchScanner.

Please correct me if I'm wrong about assuming you're trying to get a
distribution for the column families that appear in a given set of
ranges.

You can count the column qualifiers on a per tablet/row basis server
side using an Accumulo iterator, and as you iterate over your scanner,
you can merge those counts using a map.

{{{
BatchScanner scan = connector.createBatchScanner(...);
// set up a column family counting/skipping iterator

HashMap<Text, AtomicLong> cqCounts = new HashMap<Text, AtomicLong>();

for(Entry<Key, Value> e : scan) {
  AtomicLong cqCount = cqCounts.get(e.getKey().getColumnFamily());
  if(cqCount == null) {
     cqCount = new AtomicLong();
     cqCounts.put(e.getKey().getColumnFamily(), cqCount);
  }
  cqCount.addAndGet(Long.parseLong(new String(e.getValue().get()));
}
}}}

(please excuse any old/deprecated API's used)

On Mon, Jun 11, 2012 at 2:21 PM, Hunter Provyn <[EMAIL PROTECTED]> wrote:
> I have a SkippingIterator that skips entries with cq that it has seen
> before.
> It works on a Scanner, but on a BatchScanner, the iterators from different
> threads don't communicate, so the result is that results within a single
> range are unique, but across the whole set of ranges, are not unique.
> I'd prefer to perform the aggregation within the iterators if possible, but
> I don't know how.
>
> Also, thanks for your previous help, William, Keith, Bob and David.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB