Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # dev - Column Scan / table metadata


Copy link to this message
-
Re: Column Scan / table metadata
David Medinets 2013-09-19, 03:16
How would you define 'modestly-sized tables'? Are you thinking of an
absolute number like 100 Billion entries or some number of entries per
tablet? Or perhaps a time estimate - like a map-reduce job takes 60 minutes
to scan the table?
On Wed, Sep 18, 2013 at 2:57 PM, Josh Elser <[EMAIL PROTECTED]> wrote:

> There isn't a reliable way to ascertain the column set for a table via the
> Accumulo API.
>
> Scanning all of the keys in a table would work; however, this quickly
> becomes too costly to perform for modestly sized tables.
>
> An easy way to manage this is to build up the set of columns as part of
> your "ingest" code and store them in Accumulo (a separate table is
> easiest). By adding a quick cache to your ingest code, you can track a
> column schema without much extra effort or cost.
>
>
> On Wed, Sep 18, 2013 at 2:42 PM, Devin Pinkston <[EMAIL PROTECTED]
> >wrote:
>
> > I have been looking through the Accumulo source to try and find the best
> > way to derive the column structure/metadata of a table.  If I have a
> table
> > "sample", and I want to find all the column families/qualifiers, is
> there a
> > built-in facility in Accumulo to get a list of columns in that table?  Or
> > would my best option be to scan() the entire table, and only put unique
> > column families/qualifiers into a list and return to the user?
> >
> > I am imagining the user has no idea of what their columns are like in
> this
> > table, that is why I ask.
> >
> > Thanks!
> >
>