Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Is it possible to use an iterator to aggregate results of a BatchScanner?


+
Hunter Provyn 2012-06-11, 18:21
Copy link to this message
-
Re: Is it possible to use an iterator to aggregate results of a BatchScanner?
So, is a global sorting order required of your iterator? That's really
the key behavioral difference in terms of output when you're dealing
with a Scanner versus a BatchScanner.

Please correct me if I'm wrong about assuming you're trying to get a
distribution for the column families that appear in a given set of
ranges.

You can count the column qualifiers on a per tablet/row basis server
side using an Accumulo iterator, and as you iterate over your scanner,
you can merge those counts using a map.

{{{
BatchScanner scan = connector.createBatchScanner(...);
// set up a column family counting/skipping iterator

HashMap<Text, AtomicLong> cqCounts = new HashMap<Text, AtomicLong>();

for(Entry<Key, Value> e : scan) {
  AtomicLong cqCount = cqCounts.get(e.getKey().getColumnFamily());
  if(cqCount == null) {
     cqCount = new AtomicLong();
     cqCounts.put(e.getKey().getColumnFamily(), cqCount);
  }
  cqCount.addAndGet(Long.parseLong(new String(e.getValue().get()));
}
}}}

(please excuse any old/deprecated API's used)

On Mon, Jun 11, 2012 at 2:21 PM, Hunter Provyn <[EMAIL PROTECTED]> wrote:
> I have a SkippingIterator that skips entries with cq that it has seen
> before.
> It works on a Scanner, but on a BatchScanner, the iterators from different
> threads don't communicate, so the result is that results within a single
> range are unique, but across the whole set of ranges, are not unique.
> I'd prefer to perform the aggregation within the iterators if possible, but
> I don't know how.
>
> Also, thanks for your previous help, William, Keith, Bob and David.
+
Marc P. 2012-06-11, 20:52
+
Marc P. 2012-06-11, 20:54