Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Spatial data posting in HBase


Copy link to this message
-
Re: Spatial data posting in HBase
Adrien Mogenet 2013-10-13, 10:33
This is also what I had in mind. Computing the neighbors and/or the higher
level of a "tile" is a quite easy bit manipulation. Dealing with equator
corner cases must not be considered as an issue.
On Sun, Oct 13, 2013 at 1:16 AM, Nick Dimiduk <[EMAIL PROTECTED]> wrote:

> You can treat a geohash of a fixed precision as a tile and calculate the
> neighbors of that tile. This is precisely what I did in the chapter in
> HBaseIA. In that way, it's no different than a tile system.
>
>
> On Sat, Oct 12, 2013 at 11:33 AM, Michael Segel
> <[EMAIL PROTECTED]>wrote:
>
> > Adrien,
> >
> > In terms of efficiency...
> >
> > A general solution that can be applied to all problems in all areas is
> > going to be best.
> > Geohash gets ugly when you're around the equator.  You can have two
> points
> > literally a couple of km away that would have two very different geo
> hashes.
> >
> > So if you tile the globe, depending on the size of the tile, you
> calculate
> > the tile, its surrounding tiles (if necessary) and then sweep through the
> > data to find your object.
> >
> > I'm not suggesting you not to use geohash, just that its not going to be
> > the most efficient.
> >
> > Note that the the downside to tiling is that if you're doing a geospatial
> > index... your data volume explodes because you are storing references to
> > the data at different tile levels.
> >
> > Its a trade off.
> >
> >
> >
> > On Oct 12, 2013, at 2:34 AM, Adrien Mogenet <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Michael, don't you think Geohashes can be satisfying and well-suited
> for
> > > many cases anyway? Searching in a bounding box or arbitrary polygon is
> > not
> > > that heavy with Geohash, even on edge conditions. The biggest risk IMHO
> > is
> > > to have to deal with tons of invalid extra points if the geohash query
> is
> > > not accurate enough and your points distribution is very sparse so that
> > > many points will be found in a geohash despite they don't respond to
> your
> > > query criteria.
> > >
> > > However, if your query embeds enough bits of precision, Geohashes offer
> > > some nice guarantees for distributed databases and your queries should
> > > remain efficient enough.
> > >
> > > Another worst case of course is to look for K-NN since Geohash is not a
> > > real longest-common-prefix algorithm but once again, if your points
> > > distribution is approximately well balanced, this works not that bad
> > > without doing lots of recursive queries or fetching tons of useless
> data
> > > (but I do agree looking into your tiles would probably be more
> > appropriate
> > > in that case).
> > >
> > > I'm planning to write an article on that points, so further technical
> > > arguments are welcome :-}
> > >
> > > On Thu, Oct 10, 2013 at 7:51 PM, Michael Segel <
> > [EMAIL PROTECTED]>wrote:
> > >
> > >> HBase in Action goes through great depth of showing you how you could
> > >> implement GIS information in HBase.
> > >>
> > >> Unfortunately there are issues with Geohash and edge conditions which
> > make
> > >> it difficult to use when you're dealing with data on an edge of a
> > quadrant.
> > >>
> > >> A better way would be to create a point (geospatial point object) and
> > >> store it in a single column.
> > >> (This goes beyond the example of what's in the book. ) And then index
> > the
> > >> data by tiles.
> > >>
> > >>
> > >> The downside is that you end up creating a lot more data…
> > >>
> > >> Take a look at some of the stuff Boris Lublinsky published on InfoQ.
> > There
> > >> are also other articles on the net….
> > >>
> > >> On Oct 9, 2013, at 1:35 PM, Otis Gospodnetic <
> > [EMAIL PROTECTED]>
> > >> wrote:
> > >>
> > >>> The point is that there are options (multiple different hammers) if
> > >>> HBase support for geospatial is not there or doesn't meet OP's needs.
> > >>>
> > >>> Otis
> > >>> --
> > >>> Solr & ElasticSearch Support -- http://sematext.com/
> > >>> Performance Monitoring -- http://sematext.com/spm

Adrien Mogenet
http://www.borntosegfault.com