Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - How to get count of table rows using accumulo shell


Copy link to this message
-
Re: How to get count of table rows using accumulo shell
Eric Newton 2013-10-11, 17:42
You can stack a counting Combiner over the FirstEntryInRowIterator and
batch scan the table. If it's just a test data set with under a
billion rows, you can just count the result set coming out of the
FirstEntryInRowIterator.  You'll be I/O bound at the client, but it
will work.

This does it with the shell, but the output is kinda voluminous:

root@test> createtable foo
root@test foo> insert row1 cf col1 value
root@test foo> insert row1 cf col2 value
root@test foo> insert row1 cf col999 value
root@test foo> insert row2 cf col1 value
root@test foo> scan
row1 cf:col1 []    value
row1 cf:col2 []    value
row1 cf:col999 []    value
row2 cf:col1 []    value
root@test foo> setiter -class
org.apache.accumulo.core.iterators.FirstEntryInRowIterator -p 99 -scan
Only allows iteration over the first entry per row
----------> set FirstEntryInRowIterator parameter scansBeforeSeek,
Number of scans to try before seeking [10]: 10
root@test foo> egrep .*
row1 cf:col1 []    value
row2 cf:col1 []    value
On Fri, Oct 11, 2013 at 10:53 AM, Terry P. <[EMAIL PROTECTED]> wrote:
> Hi guys,
> I'm still a bit of a newbie as I'm more of an admin than a developer, and
> now that formal testing has begun, I have testers asking me how to get a
> total count of records in Accumulo for verification purposes after test
> ingests have been run.
>
> In our case when I say "records" I mean the number of distinct rowkeys, not
> the total number of entries.
>
> Is there any way to do this using just the Accumulo shell, maybe by writing
> an aggregator or other class that can be run from within the Accumulo shell?
>
> Many thanks in advance,
> Terry
>
>
> On Tue, Jan 22, 2013 at 6:03 PM, Terry P. <[EMAIL PROTECTED]> wrote:
>>
>> Greetings everyone,
>> I want to simply get the total count of rows in a table using the accumulo
>> shell.  I'm very new to Accumulo so I apologize if it's a newbie question.
>>
>> I'm prototyping with the accumulo shell, and love how it can ingest
>> records using exefile, so I've used python to generate a lot of test data.
>> For some test cases in this sprint I need to verify the rows loaded match
>> what's expected, hence the reason I need to get the total rows in a table.
>>
>> I'd bet there is some way to use setiter or setscaniter with the -agg
>> option, but I can't figure it out.
>>
>> Any help would be greatly appreciated.
>>
>> Best regards,
>> Terry
>
>