Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo, mail # dev - (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems


+
Jesse Yates 2011-12-22, 19:44
+
Ted Yu 2011-12-22, 21:54
+
Jesse Yates 2011-12-22, 22:22
+
Ted Yu 2011-12-22, 22:35
+
Jesse Yates 2011-12-22, 22:46
+
David Medinets 2011-12-22, 23:27
+
John W Vines 2011-12-23, 14:23
+
Jesse Yates 2011-12-23, 18:48
+
John W Vines 2011-12-24, 03:14
+
Mohit Anchlia 2011-12-23, 17:28
Copy link to this message
-
Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems
Jesse Yates 2011-12-24, 06:02
On Fri, Dec 23, 2011 at 9:28 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> I briefly looked at the presentation. May I ask how is it much
> different than using elasticsearch or solr? As I understand terms are
> being indexed which is also done by search engines. Just trying to
> understand the main benefit. We currently use Cassandra.
>
> Thanks
>

Culvert is designed not just to do search over documents, but to also do
general indexing over all your keyvalues. Chances are the things you are
storing are more than just unstructured text with some special key. If
thats the case, then some general, text based indexing is really all you
need. Right now, Culvert only supports a a built-in text-based index, but
is pretty easy to write new ones. The power in culvert comes from the fact
that it can integrate really easily with existing indexes (legacy systems)
and do indexing with some of its built-in indexes. If you want to look up
by something that is not the row key (primary key), then you will need to
have an index on that value - this is usually taken care of for you in
'traditional' SQL systems.

On top of just doing the indexing for you, Culvert does a lot of complex
query execution with a subset of SQL combined with a decorator design
pattern to make it really natural to build up queries. Because this
execution is built into the core of Culvert, it leverages the all the
information you have indexed - this means potentially orders of magnitude
faster queries. There is also a lot of potential work here, under the hood,
doing query optimization (culvert is pretty young).

We also can potentially do server-side joins. I don't know what Cassandra
supports in this field, but it would need to be something equivalent to
coprocessors in hbase (or a modified iterator for accumulo). Even not
having the server-side joins, we can still leverage the indexes in doing
the joins, making for much more efficient joins.

The Hive adapter is about 90% of the way there as well, which would give
you full index support on top of the ease that hive lets you write HQL for
your tables.

Finally, culvert allows you to be entirely cross-platform with other
BigTable style databases. All the queries and indexes are developed
entirely agnostically to the underlying datastore. So, if you wanted to
switch to HBase tomorrow, all you would need to do is  copy your data over
to the database (through the culvert client, though we've discussed adding
batch indexing) and then point culvert at the new install. All your queries
stay the same, leveraging the same indexes. The only work you need to
reproduce are any of the indexes you wrote by hand.

The adapter for Cassandra really wouldn't be that hard to write - there are
pretty good examples for how it works with hbase and accumulo, so I don't
expect the cassandra part to be that much different.

-Jesse

>
> On Fri, Dec 23, 2011 at 6:23 AM, John W Vines <[EMAIL PROTECTED]>
> wrote:
> > We have yet to release accumulo-1.4, so that was all you working out of
> your local repo.
> >
> > As for Accumulo-1.3.5, we are currently working on making the
> appropriate changes to get make it kosher for a maven release, but we're
> not there yet.
> >
> > John
> >
> > ----- Original Message -----
> > | From: "Jesse Yates" <[EMAIL PROTECTED]>
> > | To: [EMAIL PROTECTED]
> > | Cc: [EMAIL PROTECTED], [EMAIL PROTECTED],
> [EMAIL PROTECTED]
> > | Sent: Thursday, December 22, 2011 5:22:46 PM
> > | Subject: Re: (Re)Introducing Culvert - A secondary indexing framework
> for BigTable like systems
> > | Wow, that's embarrassing - project not building...
> > |
> > | It's because accumulo's release is no longer deployed into the
> > | standard apache maven repository. Maybe one of the accumulo committers
> > | can shed some light on where to find it?
> > |
> > | I'll make some changes and have it at least compiling from the raw
> > | tonight :)
> > |
> > | The alternative is to download accumulo source (

Jesse Yates
240-888-2200
@jesse_yates