Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> How to get count of table rows using accumulo shell


Copy link to this message
-
Re: How to get count of table rows using accumulo shell
You'll need to add the '-np' option on the scan command as well.

On 10/11/2013 03:05 PM, Jared Winick wrote:
> After following the commands Eric lists to set the iterator for that
> table, instead of running 'egrep' in the shell, you could do this from
> the Linux command line
>
> accumulo shell -u username -p password -e "scan -t foo" | wc -l
>
>
> On Fri, Oct 11, 2013 at 11:42 AM, Eric Newton <[EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>> wrote:
>
>     You can stack a counting Combiner over the FirstEntryInRowIterator and
>     batch scan the table. If it's just a test data set with under a
>     billion rows, you can just count the result set coming out of the
>     FirstEntryInRowIterator.  You'll be I/O bound at the client, but it
>     will work.
>
>     This does it with the shell, but the output is kinda voluminous:
>
>     root@test> createtable foo
>     root@test foo> insert row1 cf col1 value
>     root@test foo> insert row1 cf col2 value
>     root@test foo> insert row1 cf col999 value
>     root@test foo> insert row2 cf col1 value
>     root@test foo> scan
>     row1 cf:col1 []    value
>     row1 cf:col2 []    value
>     row1 cf:col999 []    value
>     row2 cf:col1 []    value
>     root@test foo> setiter -class
>     org.apache.accumulo.core.iterators.FirstEntryInRowIterator -p 99 -scan
>     Only allows iteration over the first entry per row
>     ----------> set FirstEntryInRowIterator parameter scansBeforeSeek,
>     Number of scans to try before seeking [10]: 10
>     root@test foo> egrep .*
>     row1 cf:col1 []    value
>     row2 cf:col1 []    value
>
>
>     On Fri, Oct 11, 2013 at 10:53 AM, Terry P. <[EMAIL PROTECTED]
>     <mailto:[EMAIL PROTECTED]>> wrote:
>     > Hi guys,
>     > I'm still a bit of a newbie as I'm more of an admin than a
>     developer, and
>     > now that formal testing has begun, I have testers asking me how
>     to get a
>     > total count of records in Accumulo for verification purposes
>     after test
>     > ingests have been run.
>     >
>     > In our case when I say "records" I mean the number of distinct
>     rowkeys, not
>     > the total number of entries.
>     >
>     > Is there any way to do this using just the Accumulo shell, maybe
>     by writing
>     > an aggregator or other class that can be run from within the
>     Accumulo shell?
>     >
>     > Many thanks in advance,
>     > Terry
>     >
>     >
>     > On Tue, Jan 22, 2013 at 6:03 PM, Terry P. <[EMAIL PROTECTED]
>     <mailto:[EMAIL PROTECTED]>> wrote:
>     >>
>     >> Greetings everyone,
>     >> I want to simply get the total count of rows in a table using
>     the accumulo
>     >> shell.  I'm very new to Accumulo so I apologize if it's a
>     newbie question.
>     >>
>     >> I'm prototyping with the accumulo shell, and love how it can ingest
>     >> records using exefile, so I've used python to generate a lot of
>     test data.
>     >> For some test cases in this sprint I need to verify the rows
>     loaded match
>     >> what's expected, hence the reason I need to get the total rows
>     in a table.
>     >>
>     >> I'd bet there is some way to use setiter or setscaniter with
>     the -agg
>     >> option, but I can't figure it out.
>     >>
>     >> Any help would be greatly appreciated.
>     >>
>     >> Best regards,
>     >> Terry
>     >
>     >
>
>