Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # dev >> Accumulo Hive Storage Handler


+
Brian Femiano 2013-05-04, 03:30
+
Jason Trost 2013-05-04, 17:16
Copy link to this message
-
Re: Accumulo Hive Storage Handler
Hey Jason,

I haven't really stressed the HBase and Accumulo storage handler queries
over any respectible scale where the differences would be dramatically
pronounced.

One advantage the AccumuloStorageHandler has over the HBase handler is the
ability to pushdown predicates involving more than just the RowID.

A WHERE clause of the form of "rowid > '5555' AND rowid < '7777' AND name 'brian' " when delegated to the HBase handler would only filter based on
the rowID, and ignore the name qualifier restriction. It's not designed to
handle predicates involving columns other than the mapped rowID.

The AccumuloPredicateHandler goes the extra mile to support qualifiers. Not
only are the rowID comparisons built into custom Range restrictions, but an
additional filter iterator is added to ignore rows that don't contain a
qualifier name exactly equal to 'brian'.

The HBase handler is more evolved in many other storage handler components,
but with respect to Predicate pushdown optimization, I believe the Accumulo
implementation to be a bit stronger. You're right though, I should really
back that up with some metrics.
On Sat, May 4, 2013 at 12:16 PM, Jason Trost <[EMAIL PROTECTED]> wrote:

> Hey Brian,
>
> This is pretty cool.  Just out of curiosity do you have any performance
> numbers for this compared to Hive over files or other datastores?  I am
> curious how much the iterators speed things with Predicate pushdowns.
>
> Thanks,
>
> --Jason
>
>
>
> On Fri, May 3, 2013 at 11:30 PM, Brian Femiano <[EMAIL PROTECTED]> wrote:
>
> > Use Hive to directly and efficiently query data stored in Accumulo
> tables.
> >
> > See the Getting Started Guide and required AUX_JARS list. The homepage
> also
> > lists the current limitations.
> >
> > I've submitted a patch ACCUMULO-143 to get this directly into Accumulo
> > trunk, but for now people can experiment with it at:
> > https://github.com/bfemiano/accumulo-hive-storage-manager.
> >
> > The CREATE EXTERNAL TABLE keywords allows Hive to create a metastore
> entry
> > for the Accumulo table, which 'theoretically' suggests you could use
> > Cloudera Impala directly with Accumulo. I have not tested this though.
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB