Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - How to get count of table rows using accumulo shell


Copy link to this message
-
Re: How to get count of table rows using accumulo shell
Billie Rinaldi 2013-10-11, 19:41
It may be the case that temporary scan-specific iterators (which are set
with the setscaniter command) are not applied to the grep or egrep
commands.  It should work with the scan command, though.
On Fri, Oct 11, 2013 at 12:24 PM, Eric Newton <[EMAIL PROTECTED]> wrote:

> Ya, you'll want to remove the iterator after you do the count.  You
> might be able to use it as a scan-only iterator, but I was just being
> lazy.
>
> -Eric
>
>
> On Fri, Oct 11, 2013 at 3:18 PM, Terry P. <[EMAIL PROTECTED]> wrote:
> > Thanks Eric, Jared, and Josh.
> >
> > Jared's reply I realize that the setiter command stays in effect beyond
> my
> > shell session obviously.  I see it now with the listiter command in the
> > shell.
> >
> > Our app normally does lookups by rowkey.  Will the firstEntry iterator
> > adversely affect those queries?  I assume not, but I want to double
> check.
> >
> > Thanks again guys, this is very helpful,
> > Terry
> >
> >
> >
> > On Fri, Oct 11, 2013 at 2:15 PM, Eric Newton <[EMAIL PROTECTED]>
> wrote:
> >>
> >> Actually, the egrep was used on purpose: it's the only way to get the
> >> shell to use the BatchScanner, which can talk to multiple tservers at
> >> once.
> >>
> >> -Eric
> >>
> >>
> >> On Fri, Oct 11, 2013 at 3:10 PM, Josh Elser <[EMAIL PROTECTED]>
> wrote:
> >> > You'll need to add the '-np' option on the scan command as well.
> >> >
> >> >
> >> > On 10/11/2013 03:05 PM, Jared Winick wrote:
> >> >>
> >> >> After following the commands Eric lists to set the iterator for that
> >> >> table, instead of running 'egrep' in the shell, you could do this
> from
> >> >> the
> >> >> Linux command line
> >> >>
> >> >> accumulo shell -u username -p password -e "scan -t foo" | wc -l
> >> >>
> >> >>
> >> >> On Fri, Oct 11, 2013 at 11:42 AM, Eric Newton <[EMAIL PROTECTED]
> >> >> <mailto:[EMAIL PROTECTED]>> wrote:
> >> >>
> >> >>     You can stack a counting Combiner over the
> FirstEntryInRowIterator
> >> >> and
> >> >>     batch scan the table. If it's just a test data set with under a
> >> >>     billion rows, you can just count the result set coming out of the
> >> >>     FirstEntryInRowIterator.  You'll be I/O bound at the client, but
> it
> >> >>     will work.
> >> >>
> >> >>     This does it with the shell, but the output is kinda voluminous:
> >> >>
> >> >>     root@test> createtable foo
> >> >>     root@test foo> insert row1 cf col1 value
> >> >>     root@test foo> insert row1 cf col2 value
> >> >>     root@test foo> insert row1 cf col999 value
> >> >>     root@test foo> insert row2 cf col1 value
> >> >>     root@test foo> scan
> >> >>     row1 cf:col1 []    value
> >> >>     row1 cf:col2 []    value
> >> >>     row1 cf:col999 []    value
> >> >>     row2 cf:col1 []    value
> >> >>     root@test foo> setiter -class
> >> >>     org.apache.accumulo.core.iterators.FirstEntryInRowIterator -p 99
> >> >> -scan
> >> >>     Only allows iteration over the first entry per row
> >> >>     ----------> set FirstEntryInRowIterator parameter
> scansBeforeSeek,
> >> >>     Number of scans to try before seeking [10]: 10
> >> >>     root@test foo> egrep .*
> >> >>     row1 cf:col1 []    value
> >> >>     row2 cf:col1 []    value
> >> >>
> >> >>
> >> >>     On Fri, Oct 11, 2013 at 10:53 AM, Terry P. <[EMAIL PROTECTED]
> >> >>     <mailto:[EMAIL PROTECTED]>> wrote:
> >> >>     > Hi guys,
> >> >>     > I'm still a bit of a newbie as I'm more of an admin than a
> >> >>     developer, and
> >> >>     > now that formal testing has begun, I have testers asking me how
> >> >>     to get a
> >> >>     > total count of records in Accumulo for verification purposes
> >> >>     after test
> >> >>     > ingests have been run.
> >> >>     >
> >> >>     > In our case when I say "records" I mean the number of distinct
> >> >>     rowkeys, not
> >> >>     > the total number of entries.
> >> >>     >
> >> >>     > Is there any way to do this using just the Accumulo shell,
> maybe
> >> >>     by writing
> >> >>     > an aggregator or other class that can be run from within the