|
Jesse Yates
2011-12-22, 19:44
Ted Yu
2011-12-22, 21:54
Jesse Yates
2011-12-22, 22:22
Ted Yu
2011-12-22, 22:35
Jesse Yates
2011-12-22, 22:46
David Medinets
2011-12-22, 23:27
John W Vines
2011-12-23, 14:23
Mohit Anchlia
2011-12-23, 17:28
Jesse Yates
2011-12-23, 18:48
John W Vines
2011-12-24, 03:14
Jesse Yates
2011-12-24, 06:02
|
-
(Re)Introducing Culvert - A secondary indexing framework for BigTable like systemsJesse Yates 2011-12-22, 19:44
Culvert was originally introduced at Hadoop Summit 2011, but recent updates
have made it very applicable to current systems. Recently, we added support for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop Summit, there have also been significant code cleanup and added some small features. However, we found that most people hadn't heard of Culvert, so we wanted to re-release the framework. For an introduction to using Culvert, check out the blog post here: http://jyates.github.com/2011/11/17/intro-to-culvert.html Also, the original presentation (where we discuss the internals) is available on slideshare<http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data> . There is a Culvert hackathon in the middle of January: http://culverthackathon2012.eventbrite.com/ Oh, and you can find the code on github<https://github.com/booz-allen-hamilton/culvert> . Below is an overview of why we wrote Culvert and what it does. Secondary indexing is a common design pattern in BigTable-like databases that allows users to index one or more columns in a table. This technique enables fast search of records in a database based on a particular column instead of the row id, thus enabling relational-style semantics in a NoSQL environment. Frequently, the index is stored either in a reserved namespace in the table or another index table. Despite the fact that this is a common design pattern in BigTable-based applications, most implementations of this practice to date have been tightly coupled with a particular application. As a result, few general-purpose frameworks for secondary indexing on BigTable-like databases exist, and those that do are tied to a particular implementation of the BigTable model. There are several existing tools (Solr, Lily), but these are focused on doing text based search and are highly restrictive to indexes created through their framework. What if you want to use your existing indexes? Or leverage the indexes to do complex queries? We developed a solution to this problem called Culvert that supports online index updates as well as a variation of the HIVE query language. In designing Culvert, we sought to make the solution pluggable so that it can be used on any of the many BigTable-like databases (HBase, Cassandra, etc.). Furthermore, it is also easily extensible to existing, hand rolled indexes. As well as being a secondary indexing framework, it is also a query execution mechanism - think pig/hive minus the fancy command line. We support a subset of SQL, but are able to take full advantage of home-rolled and built-in indexes, leading to query execution times potentially orders of magnitude smaller than existing approaches and certainly orders of magnitude more easily. -- Jesse ------------------- Jesse Yates 240-888-2200 @jesse_yates
-
Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systemsTed Yu 2011-12-22, 21:54
Thanks for the update, Jesse.
Let us know of any feature Culvert needs from HBase. After cloning Culvert, I got: [INFO] Culvert - Accumulo Integration .................... FAILURE [0.431s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 1:06.638s [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011 [INFO] Final Memory: 20M/81M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal on project culvert-accumulo: Could not resolve dependencies for project com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find artifact org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in apache-snapshots (http://repository.apache.org/snapshots/) -> [Help 1] Can someone provide hint ? On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <[EMAIL PROTECTED]>wrote: > Culvert was originally introduced at Hadoop Summit 2011, but recent updates > have made it very applicable to current systems. Recently, we added support > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop > Summit, there have also been significant code cleanup and added some small > features. However, we found that most people hadn't heard of Culvert, so we > wanted to re-release the framework. > > For an introduction to using Culvert, check out the blog post here: > http://jyates.github.com/2011/11/17/intro-to-culvert.html > > Also, the original presentation (where we discuss the internals) is > available on slideshare< > http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data > > > . > > There is a Culvert hackathon in the middle of January: > http://culverthackathon2012.eventbrite.com/ > > Oh, and you can find the code on > github<https://github.com/booz-allen-hamilton/culvert> > . > > Below is an overview of why we wrote Culvert and what it does. > > Secondary indexing is a common design pattern in BigTable-like databases > that allows users to index one or more columns in a table. This technique > enables fast search of records in a database based on a particular column > instead of the row id, thus enabling relational-style semantics in a NoSQL > environment. Frequently, the index is stored either in a reserved namespace > in the table or another index table. > > Despite the fact that this is a common design pattern in BigTable-based > applications, most implementations of this practice to date have been > tightly coupled with a particular application. As a result, few > general-purpose frameworks for secondary indexing on BigTable-like > databases exist, and those that do are tied to a particular implementation > of the BigTable model. > > There are several existing tools (Solr, Lily), but these are focused on > doing text based search and are highly restrictive to indexes created > through their framework. What if you want to use your existing indexes? Or > leverage the indexes to do complex queries? > > We developed a solution to this problem called Culvert that supports online > index updates as well as a variation of the HIVE query language. In > designing Culvert, we sought to make the solution pluggable so that it can > be used on any of the many BigTable-like databases (HBase, Cassandra, > etc.). Furthermore, it is also easily extensible to existing, hand rolled > indexes. > > As well as being a secondary indexing framework, it is also a query > execution mechanism - think pig/hive minus the fancy command line. We > support a subset of SQL, but are able to take full advantage of home-rolled > and built-in indexes, leading to query execution times potentially orders > of magnitude smaller than existing approaches and certainly orders of > magnitude more easily. > > -- Jesse > ------------------- > Jesse Yates > 240-888-2200 > @jesse_yates >
-
Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systemsJesse Yates 2011-12-22, 22:22
Wow, that's embarrassing - project not building...
It's because accumulo's release is no longer deployed into the standard apache maven repository. Maybe one of the accumulo committers can shed some light on where to find it? I'll make some changes and have it at least compiling from the raw tonight :) The alternative is to download accumulo source ( https://github.com/apache/accumulo) and "mvn clean install" to get it working on your local machine. Thanks Ted! -Jesse On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > Thanks for the update, Jesse. > Let us know of any feature Culvert needs from HBase. > > After cloning Culvert, I got: > > [INFO] Culvert - Accumulo Integration .................... FAILURE [0.431s] > [INFO] > ------------------------------------------------------------------------ > [INFO] BUILD FAILURE > [INFO] > ------------------------------------------------------------------------ > [INFO] Total time: 1:06.638s > [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011 > [INFO] Final Memory: 20M/81M > [INFO] > ------------------------------------------------------------------------ > [ERROR] Failed to execute goal on project culvert-accumulo: Could not > resolve dependencies for project > com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find > artifact org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in > apache-snapshots (http://repository.apache.org/snapshots/) -> [Help 1] > > Can someone provide hint ? > > On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <[EMAIL PROTECTED] > >wrote: > > > Culvert was originally introduced at Hadoop Summit 2011, but recent > updates > > have made it very applicable to current systems. Recently, we added > support > > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop > > Summit, there have also been significant code cleanup and added some > small > > features. However, we found that most people hadn't heard of Culvert, so > we > > wanted to re-release the framework. > > > > For an introduction to using Culvert, check out the blog post here: > > http://jyates.github.com/2011/11/17/intro-to-culvert.html > > > > Also, the original presentation (where we discuss the internals) is > > available on slideshare< > > > http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data > > > > > . > > > > There is a Culvert hackathon in the middle of January: > > http://culverthackathon2012.eventbrite.com/ > > > > Oh, and you can find the code on > > github<https://github.com/booz-allen-hamilton/culvert> > > . > > > > Below is an overview of why we wrote Culvert and what it does. > > > > Secondary indexing is a common design pattern in BigTable-like databases > > that allows users to index one or more columns in a table. This technique > > enables fast search of records in a database based on a particular column > > instead of the row id, thus enabling relational-style semantics in a > NoSQL > > environment. Frequently, the index is stored either in a reserved > namespace > > in the table or another index table. > > > > Despite the fact that this is a common design pattern in BigTable-based > > applications, most implementations of this practice to date have been > > tightly coupled with a particular application. As a result, few > > general-purpose frameworks for secondary indexing on BigTable-like > > databases exist, and those that do are tied to a particular > implementation > > of the BigTable model. > > > > There are several existing tools (Solr, Lily), but these are focused on > > doing text based search and are highly restrictive to indexes created > > through their framework. What if you want to use your existing indexes? > Or > > leverage the indexes to do complex queries? > > > > We developed a solution to this problem called Culvert that supports > online > > index updates as well as a variation of the HIVE query language. In > > designing Culvert, we sought to make the solution pluggable so that it Jesse Yates 240-888-2200 @jesse_yates
-
Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systemsTed Yu 2011-12-22, 22:35
Thanks for the hint. That works.
I had to modify culvert-accumulo/pom.xml so that it looks for 1.5.0-incubating-SNAPSHOT which was built by accumulo TRUNK. On Thu, Dec 22, 2011 at 2:22 PM, Jesse Yates <[EMAIL PROTECTED]>wrote: > Wow, that's embarrassing - project not building... > > It's because accumulo's release is no longer deployed into the standard > apache maven repository. Maybe one of the accumulo committers can shed some > light on where to find it? > > I'll make some changes and have it at least compiling from the raw tonight > :) > > The alternative is to download accumulo source ( > https://github.com/apache/accumulo) and "mvn clean install" to get it > working on your local machine. > > Thanks Ted! > > -Jesse > > On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > Thanks for the update, Jesse. > > Let us know of any feature Culvert needs from HBase. > > > > After cloning Culvert, I got: > > > > [INFO] Culvert - Accumulo Integration .................... FAILURE > [0.431s] > > [INFO] > > ------------------------------------------------------------------------ > > [INFO] BUILD FAILURE > > [INFO] > > ------------------------------------------------------------------------ > > [INFO] Total time: 1:06.638s > > [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011 > > [INFO] Final Memory: 20M/81M > > [INFO] > > ------------------------------------------------------------------------ > > [ERROR] Failed to execute goal on project culvert-accumulo: Could not > > resolve dependencies for project > > com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find > > artifact org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT > in > > apache-snapshots (http://repository.apache.org/snapshots/) -> [Help 1] > > > > Can someone provide hint ? > > > > On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <[EMAIL PROTECTED] > > >wrote: > > > > > Culvert was originally introduced at Hadoop Summit 2011, but recent > > updates > > > have made it very applicable to current systems. Recently, we added > > support > > > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop > > > Summit, there have also been significant code cleanup and added some > > small > > > features. However, we found that most people hadn't heard of Culvert, > so > > we > > > wanted to re-release the framework. > > > > > > For an introduction to using Culvert, check out the blog post here: > > > http://jyates.github.com/2011/11/17/intro-to-culvert.html > > > > > > Also, the original presentation (where we discuss the internals) is > > > available on slideshare< > > > > > > http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data > > > > > > > . > > > > > > There is a Culvert hackathon in the middle of January: > > > http://culverthackathon2012.eventbrite.com/ > > > > > > Oh, and you can find the code on > > > github<https://github.com/booz-allen-hamilton/culvert> > > > . > > > > > > Below is an overview of why we wrote Culvert and what it does. > > > > > > Secondary indexing is a common design pattern in BigTable-like > databases > > > that allows users to index one or more columns in a table. This > technique > > > enables fast search of records in a database based on a particular > column > > > instead of the row id, thus enabling relational-style semantics in a > > NoSQL > > > environment. Frequently, the index is stored either in a reserved > > namespace > > > in the table or another index table. > > > > > > Despite the fact that this is a common design pattern in BigTable-based > > > applications, most implementations of this practice to date have been > > > tightly coupled with a particular application. As a result, few > > > general-purpose frameworks for secondary indexing on BigTable-like > > > databases exist, and those that do are tied to a particular > > implementation > > > of the BigTable model. > > > > > > There are several existing tools (Solr, Lily), but these are focused on
-
Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systemsJesse Yates 2011-12-22, 22:46
I just updated trunk so that we don't build the accumulo package by default.
If you want to build with accumulo, right now we are supporting the "accumulo-1.3.5-incubating" branch, which supports the current released version of accumulo (accumulo-1.3.5<http://incubator.apache.org/accumulo/downloads/downloads.html>). Hopefully, in the near future, we can start hosting the accumulo snapshots in a publicly accessible maven repository, and we can merge the accumulo branch back into trunk. On Thu, Dec 22, 2011 at 2:35 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > Thanks for the hint. That works. > > I had to modify culvert-accumulo/pom.xml so that it looks for > 1.5.0-incubating-SNAPSHOT which was built by accumulo TRUNK. > > On Thu, Dec 22, 2011 at 2:22 PM, Jesse Yates <[EMAIL PROTECTED] > >wrote: > > > Wow, that's embarrassing - project not building... > > > > It's because accumulo's release is no longer deployed into the standard > > apache maven repository. Maybe one of the accumulo committers can shed > some > > light on where to find it? > > > > I'll make some changes and have it at least compiling from the raw > tonight > > :) > > > > The alternative is to download accumulo source ( > > https://github.com/apache/accumulo) and "mvn clean install" to get it > > working on your local machine. > > > > Thanks Ted! > > > > -Jesse > > > > On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > Thanks for the update, Jesse. > > > Let us know of any feature Culvert needs from HBase. > > > > > > After cloning Culvert, I got: > > > > > > [INFO] Culvert - Accumulo Integration .................... FAILURE > > [0.431s] > > > [INFO] > > > > ------------------------------------------------------------------------ > > > [INFO] BUILD FAILURE > > > [INFO] > > > > ------------------------------------------------------------------------ > > > [INFO] Total time: 1:06.638s > > > [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011 > > > [INFO] Final Memory: 20M/81M > > > [INFO] > > > > ------------------------------------------------------------------------ > > > [ERROR] Failed to execute goal on project culvert-accumulo: Could not > > > resolve dependencies for project > > > com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find > > > artifact > org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT > > in > > > apache-snapshots (http://repository.apache.org/snapshots/) -> [Help 1] > > > > > > Can someone provide hint ? > > > > > > On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <[EMAIL PROTECTED] > > > >wrote: > > > > > > > Culvert was originally introduced at Hadoop Summit 2011, but recent > > > updates > > > > have made it very applicable to current systems. Recently, we added > > > support > > > > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop > > > > Summit, there have also been significant code cleanup and added some > > > small > > > > features. However, we found that most people hadn't heard of Culvert, > > so > > > we > > > > wanted to re-release the framework. > > > > > > > > For an introduction to using Culvert, check out the blog post here: > > > > http://jyates.github.com/2011/11/17/intro-to-culvert.html > > > > > > > > Also, the original presentation (where we discuss the internals) is > > > > available on slideshare< > > > > > > > > > > http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data > > > > > > > > > . > > > > > > > > There is a Culvert hackathon in the middle of January: > > > > http://culverthackathon2012.eventbrite.com/ > > > > > > > > Oh, and you can find the code on > > > > github<https://github.com/booz-allen-hamilton/culvert> > > > > . > > > > > > > > Below is an overview of why we wrote Culvert and what it does. > > > > > > > > Secondary indexing is a common design pattern in BigTable-like > > databases > > > > that allows users to index one or more columns in a table. This > > technique Jesse Yates 240-888-2200 @jesse_yates
-
Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systemsDavid Medinets 2011-12-22, 23:27
+1 to get accumulo into a maven repository.
On Thu, Dec 22, 2011 at 5:46 PM, Jesse Yates <[EMAIL PROTECTED]> wrote: > Hopefully, in the near future, we can start hosting the accumulo snapshots > in a publicly accessible maven repository, and we can merge the accumulo > branch back into trunk.
-
Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systemsJohn W Vines 2011-12-23, 14:23
We have yet to release accumulo-1.4, so that was all you working out of your local repo.
As for Accumulo-1.3.5, we are currently working on making the appropriate changes to get make it kosher for a maven release, but we're not there yet. John ----- Original Message ----- | From: "Jesse Yates" <[EMAIL PROTECTED]> | To: [EMAIL PROTECTED] | Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] | Sent: Thursday, December 22, 2011 5:22:46 PM | Subject: Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems | Wow, that's embarrassing - project not building... | | It's because accumulo's release is no longer deployed into the | standard apache maven repository. Maybe one of the accumulo committers | can shed some light on where to find it? | | I'll make some changes and have it at least compiling from the raw | tonight :) | | The alternative is to download accumulo source ( | https://github.com/apache/accumulo ) and "mvn clean install" to get it | working on your local machine. | | Thanks Ted! | | -Jesse | | | On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu < [EMAIL PROTECTED] > wrote: | | | Thanks for the update, Jesse. | Let us know of any feature Culvert needs from HBase. | | After cloning Culvert, I got: | | [INFO] Culvert - Accumulo Integration .................... FAILURE | [0.431s] | [INFO] | ------------------------------------------------------------------------ | [INFO] BUILD FAILURE | [INFO] | ------------------------------------------------------------------------ | [INFO] Total time: 1:06.638s | [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011 | [INFO] Final Memory: 20M/81M | [INFO] | ------------------------------------------------------------------------ | [ERROR] Failed to execute goal on project culvert-accumulo: Could not | resolve dependencies for project | com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find | artifact | org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in | apache-snapshots ( http://repository.apache.org/snapshots/ ) -> [Help | 1] | | Can someone provide hint ? | | On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates < | [EMAIL PROTECTED] >wrote: | | | > Culvert was originally introduced at Hadoop Summit 2011, but recent | > updates | > have made it very applicable to current systems. Recently, we added | > support | > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop | > Summit, there have also been significant code cleanup and added some | > small | > features. However, we found that most people hadn't heard of | > Culvert, so we | > wanted to re-release the framework. | > | > For an introduction to using Culvert, check out the blog post here: | > http://jyates.github.com/2011/11/17/intro-to-culvert.html | > | > Also, the original presentation (where we discuss the internals) is | > available on slideshare< | > http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data | | > > | > . | > | > There is a Culvert hackathon in the middle of January: | > http://culverthackathon2012.eventbrite.com/ | > | > Oh, and you can find the code on | > github< https://github.com/booz-allen-hamilton/culvert > | | | > . | > | > Below is an overview of why we wrote Culvert and what it does. | > | > Secondary indexing is a common design pattern in BigTable-like | > databases | > that allows users to index one or more columns in a table. This | > technique | > enables fast search of records in a database based on a particular | > column | > instead of the row id, thus enabling relational-style semantics in a | > NoSQL | > environment. Frequently, the index is stored either in a reserved | > namespace | > in the table or another index table. | > | > Despite the fact that this is a common design pattern in | > BigTable-based | > applications, most implementations of this practice to date have | > been | > tightly coupled with a particular application. As a result, few | > general-purpose frameworks for secondary indexing on BigTable-like | > databases exist, and those that do are tied to a particular | > implementation | > of the BigTable model. | > | > There are several existing tools (Solr, Lily), but these are focused | > on | > doing text based search and are highly restrictive to indexes | > created | > through their framework. What if you want to use your existing | > indexes? Or | > leverage the indexes to do complex queries? | > | > We developed a solution to this problem called Culvert that supports | > online | > index updates as well as a variation of the HIVE query language. In | > designing Culvert, we sought to make the solution pluggable so that | > it can | > be used on any of the many BigTable-like databases (HBase, | > Cassandra, | > etc.). Furthermore, it is also easily extensible to existing, hand | > rolled | > indexes. | > | > As well as being a secondary indexing framework, it is also a query | > execution mechanism - think pig/hive minus the fancy command line. | > We | > support a subset of SQL, but are able to take full advantage of | > home-rolled | > and built-in indexes, leading to query execution times potentially | > orders | > of magnitude smaller than existing approaches and certainly orders | > of | > magnitude more easily. | > | > -- Jesse | > ------------------- | > Jesse Yates | > 240-888-2200 | > @jesse_yates | > | | | | -- | ------------------- | Jesse Yates | 240-888-2200 | @jesse_yates
-
Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systemsMohit Anchlia 2011-12-23, 17:28
I briefly looked at the presentation. May I ask how is it much
different than using elasticsearch or solr? As I understand terms are being indexed which is also done by search engines. Just trying to understand the main benefit. We currently use Cassandra. Thanks On Fri, Dec 23, 2011 at 6:23 AM, John W Vines <[EMAIL PROTECTED]> wrote: > We have yet to release accumulo-1.4, so that was all you working out of your local repo. > > As for Accumulo-1.3.5, we are currently working on making the appropriate changes to get make it kosher for a maven release, but we're not there yet. > > John > > ----- Original Message ----- > | From: "Jesse Yates" <[EMAIL PROTECTED]> > | To: [EMAIL PROTECTED] > | Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] > | Sent: Thursday, December 22, 2011 5:22:46 PM > | Subject: Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems > | Wow, that's embarrassing - project not building... > | > | It's because accumulo's release is no longer deployed into the > | standard apache maven repository. Maybe one of the accumulo committers > | can shed some light on where to find it? > | > | I'll make some changes and have it at least compiling from the raw > | tonight :) > | > | The alternative is to download accumulo source ( > | https://github.com/apache/accumulo ) and "mvn clean install" to get it > | working on your local machine. > | > | Thanks Ted! > | > | -Jesse > | > | > | On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu < [EMAIL PROTECTED] > wrote: > | > | > | Thanks for the update, Jesse. > | Let us know of any feature Culvert needs from HBase. > | > | After cloning Culvert, I got: > | > | [INFO] Culvert - Accumulo Integration .................... FAILURE > | [0.431s] > | [INFO] > | ------------------------------------------------------------------------ > | [INFO] BUILD FAILURE > | [INFO] > | ------------------------------------------------------------------------ > | [INFO] Total time: 1:06.638s > | [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011 > | [INFO] Final Memory: 20M/81M > | [INFO] > | ------------------------------------------------------------------------ > | [ERROR] Failed to execute goal on project culvert-accumulo: Could not > | resolve dependencies for project > | com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find > | artifact > | org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in > | apache-snapshots ( http://repository.apache.org/snapshots/ ) -> [Help > | 1] > | > | Can someone provide hint ? > | > | On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates < > | [EMAIL PROTECTED] >wrote: > | > | > | > Culvert was originally introduced at Hadoop Summit 2011, but recent > | > updates > | > have made it very applicable to current systems. Recently, we added > | > support > | > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop > | > Summit, there have also been significant code cleanup and added some > | > small > | > features. However, we found that most people hadn't heard of > | > Culvert, so we > | > wanted to re-release the framework. > | > > | > For an introduction to using Culvert, check out the blog post here: > | > http://jyates.github.com/2011/11/17/intro-to-culvert.html > | > > | > Also, the original presentation (where we discuss the internals) is > | > available on slideshare< > | > http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data > | > | > > > | > . > | > > | > There is a Culvert hackathon in the middle of January: > | > http://culverthackathon2012.eventbrite.com/ > | > > | > Oh, and you can find the code on > | > github< https://github.com/booz-allen-hamilton/culvert > > | > | > | > . > | > > | > Below is an overview of why we wrote Culvert and what it does. > | > > | > Secondary indexing is a common design pattern in BigTable-like > | > databases > | > that allows users to index one or more columns in a table. This
-
Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systemsJesse Yates 2011-12-23, 18:48
What about doing -SNAPSHOT releases of dev branches? I know other projects
tend to do that, so people can easily pull in current(ish) dev branches for local development against upcoming features. Thanks! -Jesse On Fri, Dec 23, 2011 at 6:23 AM, John W Vines <[EMAIL PROTECTED]> wrote: > We have yet to release accumulo-1.4, so that was all you working out of > your local repo. > > As for Accumulo-1.3.5, we are currently working on making the appropriate > changes to get make it kosher for a maven release, but we're not there yet. > > John > > ----- Original Message ----- > | From: "Jesse Yates" <[EMAIL PROTECTED]> > | To: [EMAIL PROTECTED] > | Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], > [EMAIL PROTECTED] > | Sent: Thursday, December 22, 2011 5:22:46 PM > | Subject: Re: (Re)Introducing Culvert - A secondary indexing framework > for BigTable like systems > | Wow, that's embarrassing - project not building... > | > | It's because accumulo's release is no longer deployed into the > | standard apache maven repository. Maybe one of the accumulo committers > | can shed some light on where to find it? > | > | I'll make some changes and have it at least compiling from the raw > | tonight :) > | > | The alternative is to download accumulo source ( > | https://github.com/apache/accumulo ) and "mvn clean install" to get it > | working on your local machine. > | > | Thanks Ted! > | > | -Jesse > | > | > | On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu < [EMAIL PROTECTED] > wrote: > | > | > | Thanks for the update, Jesse. > | Let us know of any feature Culvert needs from HBase. > | > | After cloning Culvert, I got: > | > | [INFO] Culvert - Accumulo Integration .................... FAILURE > | [0.431s] > | [INFO] > | ------------------------------------------------------------------------ > | [INFO] BUILD FAILURE > | [INFO] > | ------------------------------------------------------------------------ > | [INFO] Total time: 1:06.638s > | [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011 > | [INFO] Final Memory: 20M/81M > | [INFO] > | ------------------------------------------------------------------------ > | [ERROR] Failed to execute goal on project culvert-accumulo: Could not > | resolve dependencies for project > | com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find > | artifact > | org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in > | apache-snapshots ( http://repository.apache.org/snapshots/ ) -> [Help > | 1] > | > | Can someone provide hint ? > | > | On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates < > | [EMAIL PROTECTED] >wrote: > | > | > | > Culvert was originally introduced at Hadoop Summit 2011, but recent > | > updates > | > have made it very applicable to current systems. Recently, we added > | > support > | > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop > | > Summit, there have also been significant code cleanup and added some > | > small > | > features. However, we found that most people hadn't heard of > | > Culvert, so we > | > wanted to re-release the framework. > | > > | > For an introduction to using Culvert, check out the blog post here: > | > http://jyates.github.com/2011/11/17/intro-to-culvert.html > | > > | > Also, the original presentation (where we discuss the internals) is > | > available on slideshare< > | > > http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data > | > | > > > | > . > | > > | > There is a Culvert hackathon in the middle of January: > | > http://culverthackathon2012.eventbrite.com/ > | > > | > Oh, and you can find the code on > | > github< https://github.com/booz-allen-hamilton/culvert > > | > | > | > . > | > > | > Below is an overview of why we wrote Culvert and what it does. > | > > | > Secondary indexing is a common design pattern in BigTable-like > | > databases > | > that allows users to index one or more columns in a table. This Jesse Yates 240-888-2200 @jesse_yates
-
Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systemsJohn W Vines 2011-12-24, 03:14
It's not a problem of deciding which versions to release to maven, etc. It's an issue of having our deployed jars and poms being Apache compliant.
But once we get ourselves in order, that's a pretty good idea. John ----- Original Message ----- | From: "Jesse Yates" <[EMAIL PROTECTED]> | To: [EMAIL PROTECTED] | Sent: Friday, December 23, 2011 1:48:45 PM | Subject: Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems | What about doing -SNAPSHOT releases of dev branches? I know other | projects | tend to do that, so people can easily pull in current(ish) dev | branches for | local development against upcoming features. | | Thanks! | | -Jesse | | On Fri, Dec 23, 2011 at 6:23 AM, John W Vines <[EMAIL PROTECTED]> | wrote: | | > We have yet to release accumulo-1.4, so that was all you working out | > of | > your local repo. | > | > As for Accumulo-1.3.5, we are currently working on making the | > appropriate | > changes to get make it kosher for a maven release, but we're not | > there yet. | > | > John | > | > ----- Original Message ----- | > | From: "Jesse Yates" <[EMAIL PROTECTED]> | > | To: [EMAIL PROTECTED] | > | Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], | > [EMAIL PROTECTED] | > | Sent: Thursday, December 22, 2011 5:22:46 PM | > | Subject: Re: (Re)Introducing Culvert - A secondary indexing | > | framework | > for BigTable like systems | > | Wow, that's embarrassing - project not building... | > | | > | It's because accumulo's release is no longer deployed into the | > | standard apache maven repository. Maybe one of the accumulo | > | committers | > | can shed some light on where to find it? | > | | > | I'll make some changes and have it at least compiling from the raw | > | tonight :) | > | | > | The alternative is to download accumulo source ( | > | https://github.com/apache/accumulo ) and "mvn clean install" to | > | get it | > | working on your local machine. | > | | > | Thanks Ted! | > | | > | -Jesse | > | | > | | > | On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu < [EMAIL PROTECTED] > | > | wrote: | > | | > | | > | Thanks for the update, Jesse. | > | Let us know of any feature Culvert needs from HBase. | > | | > | After cloning Culvert, I got: | > | | > | [INFO] Culvert - Accumulo Integration .................... FAILURE | > | [0.431s] | > | [INFO] | > | ------------------------------------------------------------------------ | > | [INFO] BUILD FAILURE | > | [INFO] | > | ------------------------------------------------------------------------ | > | [INFO] Total time: 1:06.638s | > | [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011 | > | [INFO] Final Memory: 20M/81M | > | [INFO] | > | ------------------------------------------------------------------------ | > | [ERROR] Failed to execute goal on project culvert-accumulo: Could | > | not | > | resolve dependencies for project | > | com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not | > | find | > | artifact | > | org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in | > | apache-snapshots ( http://repository.apache.org/snapshots/ ) -> | > | [Help | > | 1] | > | | > | Can someone provide hint ? | > | | > | On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates < | > | [EMAIL PROTECTED] >wrote: | > | | > | | > | > Culvert was originally introduced at Hadoop Summit 2011, but | > | > recent | > | > updates | > | > have made it very applicable to current systems. Recently, we | > | > added | > | > support | > | > for Accumulo as well as upgraded HBase support to 0.92. Since | > | > Hadoop | > | > Summit, there have also been significant code cleanup and added | > | > some | > | > small | > | > features. However, we found that most people hadn't heard of | > | > Culvert, so we | > | > wanted to re-release the framework. | > | > | > | > For an introduction to using Culvert, check out the blog post | > | > here: | > | > http://jyates.github.com/2011/11/17/intro-to-culvert.html | > | > | > | > Also, the original presentation (where we discuss the internals) | > | > is | > | > available on slideshare< | > | > | > http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data | > | | > | > > | > | > . | > | > | > | > There is a Culvert hackathon in the middle of January: | > | > http://culverthackathon2012.eventbrite.com/ | > | > | > | > Oh, and you can find the code on | > | > github< https://github.com/booz-allen-hamilton/culvert > | > | | > | | > | > . | > | > | > | > Below is an overview of why we wrote Culvert and what it does. | > | > | > | > Secondary indexing is a common design pattern in BigTable-like | > | > databases | > | > that allows users to index one or more columns in a table. This | > | > technique | > | > enables fast search of records in a database based on a | > | > particular | > | > column | > | > instead of the row id, thus enabling relational-style semantics | > | > in a | > | > NoSQL | > | > environment. Frequently, the index is stored either in a | > | > reserved | > | > namespace | > | > in the table or another index table. | > | > | > | > Despite the fact that this is a common design pattern in | > | > BigTable-based | > | > applications, most implementations of this practice to date have | > | > been | > | > tightly coupled with a particular application. As a result, few | > | > general-purpose frameworks for secondary indexing on | > | > BigTable-like | > | > databases exist, and those that do are tied to a particular | > | > implementation | > | > of the BigTable model. | > | > | > | > There are several existing tools (Solr, Lily), but these are | > | > focused | > | > on | > | > doing text based search and are highly restrictive to indexes | > | > created | > | > through their framework. What if you want to use your existing | > | > indexes? Or | > | > leverage the indexes to do complex queries? | > | > | > | > We developed
-
Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systemsJesse Yates 2011-12-24, 06:02
On Fri, Dec 23, 2011 at 9:28 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
> I briefly looked at the presentation. May I ask how is it much > different than using elasticsearch or solr? As I understand terms are > being indexed which is also done by search engines. Just trying to > understand the main benefit. We currently use Cassandra. > > Thanks > Culvert is designed not just to do search over documents, but to also do general indexing over all your keyvalues. Chances are the things you are storing are more than just unstructured text with some special key. If thats the case, then some general, text based indexing is really all you need. Right now, Culvert only supports a a built-in text-based index, but is pretty easy to write new ones. The power in culvert comes from the fact that it can integrate really easily with existing indexes (legacy systems) and do indexing with some of its built-in indexes. If you want to look up by something that is not the row key (primary key), then you will need to have an index on that value - this is usually taken care of for you in 'traditional' SQL systems. On top of just doing the indexing for you, Culvert does a lot of complex query execution with a subset of SQL combined with a decorator design pattern to make it really natural to build up queries. Because this execution is built into the core of Culvert, it leverages the all the information you have indexed - this means potentially orders of magnitude faster queries. There is also a lot of potential work here, under the hood, doing query optimization (culvert is pretty young). We also can potentially do server-side joins. I don't know what Cassandra supports in this field, but it would need to be something equivalent to coprocessors in hbase (or a modified iterator for accumulo). Even not having the server-side joins, we can still leverage the indexes in doing the joins, making for much more efficient joins. The Hive adapter is about 90% of the way there as well, which would give you full index support on top of the ease that hive lets you write HQL for your tables. Finally, culvert allows you to be entirely cross-platform with other BigTable style databases. All the queries and indexes are developed entirely agnostically to the underlying datastore. So, if you wanted to switch to HBase tomorrow, all you would need to do is copy your data over to the database (through the culvert client, though we've discussed adding batch indexing) and then point culvert at the new install. All your queries stay the same, leveraging the same indexes. The only work you need to reproduce are any of the indexes you wrote by hand. The adapter for Cassandra really wouldn't be that hard to write - there are pretty good examples for how it works with hbase and accumulo, so I don't expect the cassandra part to be that much different. -Jesse > > On Fri, Dec 23, 2011 at 6:23 AM, John W Vines <[EMAIL PROTECTED]> > wrote: > > We have yet to release accumulo-1.4, so that was all you working out of > your local repo. > > > > As for Accumulo-1.3.5, we are currently working on making the > appropriate changes to get make it kosher for a maven release, but we're > not there yet. > > > > John > > > > ----- Original Message ----- > > | From: "Jesse Yates" <[EMAIL PROTECTED]> > > | To: [EMAIL PROTECTED] > > | Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], > [EMAIL PROTECTED] > > | Sent: Thursday, December 22, 2011 5:22:46 PM > > | Subject: Re: (Re)Introducing Culvert - A secondary indexing framework > for BigTable like systems > > | Wow, that's embarrassing - project not building... > > | > > | It's because accumulo's release is no longer deployed into the > > | standard apache maven repository. Maybe one of the accumulo committers > > | can shed some light on where to find it? > > | > > | I'll make some changes and have it at least compiling from the raw > > | tonight :) > > | > > | The alternative is to download accumulo source ( Jesse Yates 240-888-2200 @jesse_yates |