Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> how to use CountingIterator to count records?


Copy link to this message
-
RE: how to use CountingIterator to count records?
Hunter

If you have access to the ingest of this data, have you considered implementing an Edge Table to keep the count based on a document partition index (or similar aggregate key)?  I have to keep up with the same statistic and have moved to the Edge Table approach for a direct look up of occurrences.  

-----Original Message-----
From: Keith Turner [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, June 06, 2012 13:03
To: [EMAIL PROTECTED]
Subject: Re: how to use CountingIterator to count records?

On Wed, Jun 6, 2012 at 1:46 PM, William Slacum <[EMAIL PROTECTED]> wrote:
> You're kind of there. Essentially, you can think of your Scanner's
> interactions with the TServers as a tree with a height of two. Your

One comment to add.  The Scanner will do this work serially, one tablet server at a time.  The batch scanner would execute the iterator in parallel on multiple tablet servers at a time.
> Scanner is the "root" and its children are all of the TServers it
> needs to interact with. Essentially, the operation you'd want to is
> sum the number of records each of the children have.
>
> In Accumulo terms, you can use something like a CountingIterator to
> count the number of results on each TServer. You can then sum all of
> those intermediate results to get a total count of results.
>
> On Wed, Jun 6, 2012 at 10:39 AM, Hunter Provyn <[EMAIL PROTECTED]> wrote:
>> I want to know the number of records a scanner has without actually
>> getting the records from cloudbase.
>> I've been looking at CountingIterator (1.3.4), which has a getCount()
>> method.  However, I don't know how to access the instance to call
>> getCount() on it because Cloudbase server just passes back the
>> entries and doesn't expose the instance of the iterator.
>>
>> It is possible to use an AggregatingIterator to aggregate all entries
>> into a single entry whose value is the number of entries.  But I was
>> wondering if there was a better way that possibly makes use of the
>> CountingIterator class.
>>