|
|
Anthony Fox 2011-10-26, 20:30
All,
I would like to gauge the interest in an extension to Accumulo to enable geospatial capabilities. Currently, I have developed a schema for storing raster data as tiles in Accumulo and a plugin to Geoserver that allows Accumulo tables that use the specified schema to be exposed as WMS layers for importing into a GIS. This is a natural fit for Accumulo since the individual tiles are not large but the aggregate set of tiles that make up a single layer can become very large. Accumulo packages those tiles into blocks and distributes them around the cloud for quick access and redundant storage. The implementation is in an early state.
I am currently investigating the feasibility of implementing an API for storing, querying, and processing vector data in Accumulo. I would like the API to be able to answer nearest neighbor queries, perform on-the-fly reprojections for queries that come in in a particular projection, various standard geospatial transformations such as buffering and finding intersections, etc. My current thought is that the approach would be similar to how PostGIS extends Postgres in that it dictates a schema and storage format and then provides a user level api (a bunch of sql functions) for processing that data. PostGIS also provides an r-tree index implemented on top of GiST to enable geospatial querying. This type of functionality is also a natural fit for Accumulo as r-tree minimum bounding rectangles can map to tablet extents. However, this change would require modifications to core functionality. Some mechanism for hooking in alternative 'extents' may be a technique for dealing with this kind of indexing scheme.
Is there any interest in these kinds of geospatial processing capabilities in the Accumulo community and has anyone thought about/implemented some geospatial functions?
Thanks, Anthony
Billie J Rinaldi 2011-10-28, 20:52
Anthony,
It sounds interesting. I have been thinking about how to start fostering a set of contrib projects for Accumulo, but am unsure how we would manage such things effectively (e.g. how do we make sure they work? are they versioned and released with Accumulo?). Perhaps we could begin to work this out with your project.
Billie ----- Original Message ----- > From: "Anthony Fox" <[EMAIL PROTECTED]> > To: "Accumulo dev" <[EMAIL PROTECTED]> > Sent: Wednesday, October 26, 2011 4:30:40 PM > Subject: accumulo geo > All, > > I would like to gauge the interest in an extension to Accumulo to > enable > geospatial capabilities. Currently, I have developed a schema for > storing > raster data as tiles in Accumulo and a plugin to Geoserver that allows > Accumulo tables that use the specified schema to be exposed as WMS > layers > for importing into a GIS. This is a natural fit for Accumulo since the > individual tiles are not large but the aggregate set of tiles that > make up > a single layer can become very large. Accumulo packages those tiles > into > blocks and distributes them around the cloud for quick access and > redundant > storage. The implementation is in an early state. > > I am currently investigating the feasibility of implementing an API > for > storing, querying, and processing vector data in Accumulo. I would > like > the API to be able to answer nearest neighbor queries, perform > on-the-fly > reprojections for queries that come in in a particular projection, > various > standard geospatial transformations such as buffering and finding > intersections, etc. My current thought is that the approach would be > similar to how PostGIS extends Postgres in that it dictates a schema > and > storage format and then provides a user level api (a bunch of sql > functions) for processing that data. PostGIS also provides an r-tree > index > implemented on top of GiST to enable geospatial querying. This type of > functionality is also a natural fit for Accumulo as r-tree minimum > bounding > rectangles can map to tablet extents. However, this change would > require > modifications to core functionality. Some mechanism for hooking in > alternative 'extents' may be a technique for dealing with this kind of > indexing scheme. > > Is there any interest in these kinds of geospatial processing > capabilities > in the Accumulo community and has anyone thought about/implemented > some > geospatial functions? > > Thanks, > Anthony
Todd Lipcon 2011-10-28, 21:06
Hey Billie,
A word of warning on contribs: one thing to be wary of is the "drive by contribution". We found in Hadoop and HBase that many contribs were added to Hadoop as part of a research project or other "passing interest", and then not maintained. Since the core committers had very little knowledge of the contrib components, and the authors were no longer actively maintaining them, they ended up as rotting appendages to our codebase. Users would run into issues and then we'd be unable to help them work through them - not good for anyone.
In HBase, we ended up ejecting our contribs to github. This worked out well - some have done OK, others have died off. But the ones that died off had no maintainers anyway - so better to let them die on their own than drag them forward unmaintained in SVN. We've always had the stance that, if an HBase-related project on github or elsewhere wants to enter contrib, then they can do so provided they have active maintainers who are truly committed to long term maintenance. For example, our REST server module graduated from contrib into a core part of our project, since its maintainers are also HBase committers who run the stuff in production.
Not sure if this is "Apache-like" -- just my opinion as another developer.
-Todd
On Fri, Oct 28, 2011 at 1:52 PM, Billie J Rinaldi <[EMAIL PROTECTED]> wrote: > Anthony, > > It sounds interesting. I have been thinking about how to start fostering a set of contrib projects for Accumulo, but am unsure how we would manage such things effectively (e.g. how do we make sure they work? are they versioned and released with Accumulo?). Perhaps we could begin to work this out with your project. > > Billie > > > ----- Original Message ----- >> From: "Anthony Fox" <[EMAIL PROTECTED]> >> To: "Accumulo dev" <[EMAIL PROTECTED]> >> Sent: Wednesday, October 26, 2011 4:30:40 PM >> Subject: accumulo geo >> All, >> >> I would like to gauge the interest in an extension to Accumulo to >> enable >> geospatial capabilities. Currently, I have developed a schema for >> storing >> raster data as tiles in Accumulo and a plugin to Geoserver that allows >> Accumulo tables that use the specified schema to be exposed as WMS >> layers >> for importing into a GIS. This is a natural fit for Accumulo since the >> individual tiles are not large but the aggregate set of tiles that >> make up >> a single layer can become very large. Accumulo packages those tiles >> into >> blocks and distributes them around the cloud for quick access and >> redundant >> storage. The implementation is in an early state. >> >> I am currently investigating the feasibility of implementing an API >> for >> storing, querying, and processing vector data in Accumulo. I would >> like >> the API to be able to answer nearest neighbor queries, perform >> on-the-fly >> reprojections for queries that come in in a particular projection, >> various >> standard geospatial transformations such as buffering and finding >> intersections, etc. My current thought is that the approach would be >> similar to how PostGIS extends Postgres in that it dictates a schema >> and >> storage format and then provides a user level api (a bunch of sql >> functions) for processing that data. PostGIS also provides an r-tree >> index >> implemented on top of GiST to enable geospatial querying. This type of >> functionality is also a natural fit for Accumulo as r-tree minimum >> bounding >> rectangles can map to tablet extents. However, this change would >> require >> modifications to core functionality. Some mechanism for hooking in >> alternative 'extents' may be a technique for dealing with this kind of >> indexing scheme. >> >> Is there any interest in these kinds of geospatial processing >> capabilities >> in the Accumulo community and has anyone thought about/implemented >> some >> geospatial functions? >> >> Thanks, >> Anthony >
-- Todd Lipcon Software Engineer, Cloudera
mvangeertruy@... 2011-10-28, 21:53
At Karaf we set up all of the contribs as their own sub- projects. W e release them seperately and ensure that the core karaf project doesn't depend on the contribs. This way, if a contrib falls behind , we don't have any depedencies and can easily remove them. I would suggest creating an accumulo-geo sub-project for this reason .
Mike Van
ASF Committer
----- Original Message ----- From: "Todd Lipcon" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Friday, October 28, 2011 5:06:48 PM Subject: Re: accumulo geo
Hey Billie,
A word of warning on contribs: one thing to be wary of is the "drive by contribution". We found in Hadoop and HBase that many contribs were added to Hadoop as part of a research project or other "passing interest", and then not maintained. Since the core committers had very little knowledge of the contrib components, and the authors were no longer actively maintaining them, they ended up as rotting appendages to our codebase. Users would run into issues and then we'd be unable to help them work through them - not good for anyone.
In HBase, we ended up ejecting our contribs to github. This worked out well - some have done OK, others have died off. But the ones that died off had no maintainers anyway - so better to let them die on their own than drag them forward unmaintained in SVN. We've always had the stance that, if an HBase-related project on github or elsewhere wants to enter contrib, then they can do so provided they have active maintainers who are truly committed to long term maintenance. For example, our REST server module graduated from contrib into a core part of our project, since its maintainers are also HBase committers who run the stuff in production.
Not sure if this is "Apache-like" -- just my opinion as another developer.
-Todd
On Fri, Oct 28, 2011 at 1:52 PM, Billie J Rinaldi <[EMAIL PROTECTED]> wrote: > Anthony, > > It sounds interesting. I have been thinking about how to start fostering a set of contrib projects for Accumulo, but am unsure how we would manage such things effectively (e.g. how do we make sure they work? are they versioned and released with Accumulo?). Perhaps we could begin to work this out with your project. > > Billie > > > ----- Original Message ----- >> From: "Anthony Fox" <[EMAIL PROTECTED]> >> To: "Accumulo dev" <[EMAIL PROTECTED]> >> Sent: Wednesday, October 26, 2011 4:30:40 PM >> Subject: accumulo geo >> All, >> >> I would like to gauge the interest in an extension to Accumulo to >> enable >> geospatial capabilities. Currently, I have developed a schema for >> storing >> raster data as tiles in Accumulo and a plugin to Geoserver that allows >> Accumulo tables that use the specified schema to be exposed as WMS >> layers >> for importing into a GIS. This is a natural fit for Accumulo since the >> individual tiles are not large but the aggregate set of tiles that >> make up >> a single layer can become very large. Accumulo packages those tiles >> into >> blocks and distributes them around the cloud for quick access and >> redundant >> storage. The implementation is in an early state. >> >> I am currently investigating the feasibility of implementing an API >> for >> storing, querying, and processing vector data in Accumulo. I would >> like >> the API to be able to answer nearest neighbor queries, perform >> on-the-fly >> reprojections for queries that come in in a particular projection, >> various >> standard geospatial transformations such as buffering and finding >> intersections, etc. My current thought is that the approach would be >> similar to how PostGIS extends Postgres in that it dictates a schema >> and >> storage format and then provides a user level api (a bunch of sql >> functions) for processing that data. PostGIS also provides an r-tree >> index >> implemented on top of GiST to enable geospatial querying. This type of
Todd Lipcon Software Engineer, Cloudera
Anthony Fox 2011-11-01, 13:51
I'm fine with whatever guidelines you all put together regarding contribs. My plugin is built against an older version of Accumulo so will need to be ported to the latest version. Initially, I'm just inquiring into whether anyone has interest in geo related functionality and has thought about an architecture for providing that kind of API.
On Fri 28 Oct 2011 05:53:25 PM EDT, [EMAIL PROTECTED] wrote: > At Karaf we set up all of the contribs as their own sub-projects. We > release them seperately and ensure that the core karaf project doesn't > depend on the contribs. This way, if a contrib falls behind, we don't > have any depedencies and can easily remove them. I would suggest > creating an accumulo-geo sub-project for this reason. > > Mike Van > > ASF Committer > > ------------------------------------------------------------------------ > > *From: *"Todd Lipcon" <[EMAIL PROTECTED]> > *To: *[EMAIL PROTECTED] > *Cc: *[EMAIL PROTECTED] > *Sent: *Friday, October 28, 2011 5:06:48 PM > *Subject: *Re: accumulo geo > > Hey Billie, > > A word of warning on contribs: one thing to be wary of is the "drive > by contribution". We found in Hadoop and HBase that many contribs were > added to Hadoop as part of a research project or other "passing > interest", and then not maintained. Since the core committers had very > little knowledge of the contrib components, and the authors were no > longer actively maintaining them, they ended up as rotting appendages > to our codebase. Users would run into issues and then we'd be unable > to help them work through them - not good for anyone. > > In HBase, we ended up ejecting our contribs to github. This worked out > well - some have done OK, others have died off. But the ones that died > off had no maintainers anyway - so better to let them die on their own > than drag them forward unmaintained in SVN. We've always had the > stance that, if an HBase-related project on github or elsewhere wants > to enter contrib, then they can do so provided they have active > maintainers who are truly committed to long term maintenance. For > example, our REST server module graduated from contrib into a core > part of our project, since its maintainers are also HBase committers > who run the stuff in production. > > Not sure if this is "Apache-like" -- just my opinion as another developer. > > -Todd > > On Fri, Oct 28, 2011 at 1:52 PM, Billie J Rinaldi > <[EMAIL PROTECTED]> wrote: > > Anthony, > > > > It sounds interesting. I have been thinking about how to start > fostering a set of contrib projects for Accumulo, but am unsure how we > would manage such things effectively (e.g. how do we make sure they > work? are they versioned and released with Accumulo?). Perhaps we > could begin to work this out with your project. > > > > Billie > > > > > > ----- Original Message ----- > >> From: "Anthony Fox" <[EMAIL PROTECTED]> > >> To: "Accumulo dev" <[EMAIL PROTECTED]> > >> Sent: Wednesday, October 26, 2011 4:30:40 PM > >> Subject: accumulo geo > >> All, > >> > >> I would like to gauge the interest in an extension to Accumulo to > >> enable > >> geospatial capabilities. Currently, I have developed a schema for > >> storing > >> raster data as tiles in Accumulo and a plugin to Geoserver that allows > >> Accumulo tables that use the specified schema to be exposed as WMS > >> layers > >> for importing into a GIS. This is a natural fit for Accumulo since the > >> individual tiles are not large but the aggregate set of tiles that > >> make up > >> a single layer can become very large. Accumulo packages those tiles > >> into > >> blocks and distributes them around the cloud for quick access and > >> redundant > >> storage. The implementation is in an early state. > >> > >> I am currently investigating the feasibility of implementing an API > >> for > >> storing, querying, and processing vector data in Accumulo. I would > >> like > >> the API to be able to answer nearest neighbor queries, perform
Billie J Rinaldi 2011-11-03, 19:58
On Tuesday, November 1, 2011 9:51:46 AM, "Anthony Fox" <[EMAIL PROTECTED]> wrote: > I'm fine with whatever guidelines you all put together regarding > contribs. My plugin is built against an older version of Accumulo so > will need to be ported to the latest version. Initially, I'm just > inquiring into whether anyone has interest in geo related > functionality > and has thought about an architecture for providing that kind of API.
I imagine people would be interested. At the last BigDataCamp I attended, there was an entire breakout session about geo for various key/value stores.
I haven't really thought about how such a project would interface with Accumulo. This may be one of those "if you build it they will come" situations.
Billie
Jesse Yates 2011-11-03, 20:06
I like Todd's suggestion of maintaining them on github. Until they have shown that people are actively using and maintaining the project, its hard to justify adding it to the official svn. Once it goes in, it becomes a much big issue to remove it.
However, once they can show there is 'good' community support around the contrib project, I don't see a reason why we couldn't include it as another maven module. I also like the idea of having different profiles for each/multiple contribs so when people want to build a distro of accumulo with contribs X, Y, and Z they can do it from source with one command.
Just my two cents.
-- Jesse ------------------- Jesse Yates 240-888-2200 @jesse_yates On Thu, Nov 3, 2011 at 12:58 PM, Billie J Rinaldi <[EMAIL PROTECTED] > wrote:
> On Tuesday, November 1, 2011 9:51:46 AM, "Anthony Fox" < > [EMAIL PROTECTED]> wrote: > > I'm fine with whatever guidelines you all put together regarding > > contribs. My plugin is built against an older version of Accumulo so > > will need to be ported to the latest version. Initially, I'm just > > inquiring into whether anyone has interest in geo related > > functionality > > and has thought about an architecture for providing that kind of API. > > I imagine people would be interested. At the last BigDataCamp I attended, > there was an entire breakout session about geo for various key/value stores. > > I haven't really thought about how such a project would interface with > Accumulo. This may be one of those "if you build it they will come" > situations. > > Billie >
|
|