Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Column Scan / table metadata


Copy link to this message
-
Re: Column Scan / table metadata
How would you define 'modestly-sized tables'? Are you thinking of an
absolute number like 100 Billion entries or some number of entries per
tablet? Or perhaps a time estimate - like a map-reduce job takes 60 minutes
to scan the table?
On Wed, Sep 18, 2013 at 2:57 PM, Josh Elser <[EMAIL PROTECTED]> wrote:

> There isn't a reliable way to ascertain the column set for a table via the
> Accumulo API.
>
> Scanning all of the keys in a table would work; however, this quickly
> becomes too costly to perform for modestly sized tables.
>
> An easy way to manage this is to build up the set of columns as part of
> your "ingest" code and store them in Accumulo (a separate table is
> easiest). By adding a quick cache to your ingest code, you can track a
> column schema without much extra effort or cost.
>
>
> On Wed, Sep 18, 2013 at 2:42 PM, Devin Pinkston <[EMAIL PROTECTED]
> >wrote:
>
> > I have been looking through the Accumulo source to try and find the best
> > way to derive the column structure/metadata of a table.  If I have a
> table
> > "sample", and I want to find all the column families/qualifiers, is
> there a
> > built-in facility in Accumulo to get a list of columns in that table?  Or
> > would my best option be to scan() the entire table, and only put unique
> > column families/qualifiers into a list and return to the user?
> >
> > I am imagining the user has no idea of what their columns are like in
> this
> > table, that is why I ask.
> >
> > Thanks!
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB