-Re: Spatial data posting in HBase
Nick Dimiduk 2013-10-12, 23:16
You can treat a geohash of a fixed precision as a tile and calculate the
neighbors of that tile. This is precisely what I did in the chapter in
HBaseIA. In that way, it's no different than a tile system.
On Sat, Oct 12, 2013 at 11:33 AM, Michael Segel
> In terms of efficiency...
> A general solution that can be applied to all problems in all areas is
> going to be best.
> Geohash gets ugly when you're around the equator. You can have two points
> literally a couple of km away that would have two very different geo hashes.
> So if you tile the globe, depending on the size of the tile, you calculate
> the tile, its surrounding tiles (if necessary) and then sweep through the
> data to find your object.
> I'm not suggesting you not to use geohash, just that its not going to be
> the most efficient.
> Note that the the downside to tiling is that if you're doing a geospatial
> index... your data volume explodes because you are storing references to
> the data at different tile levels.
> Its a trade off.
> On Oct 12, 2013, at 2:34 AM, Adrien Mogenet <[EMAIL PROTECTED]>
> > Michael, don't you think Geohashes can be satisfying and well-suited for
> > many cases anyway? Searching in a bounding box or arbitrary polygon is
> > that heavy with Geohash, even on edge conditions. The biggest risk IMHO
> > to have to deal with tons of invalid extra points if the geohash query is
> > not accurate enough and your points distribution is very sparse so that
> > many points will be found in a geohash despite they don't respond to your
> > query criteria.
> > However, if your query embeds enough bits of precision, Geohashes offer
> > some nice guarantees for distributed databases and your queries should
> > remain efficient enough.
> > Another worst case of course is to look for K-NN since Geohash is not a
> > real longest-common-prefix algorithm but once again, if your points
> > distribution is approximately well balanced, this works not that bad
> > without doing lots of recursive queries or fetching tons of useless data
> > (but I do agree looking into your tiles would probably be more
> > in that case).
> > I'm planning to write an article on that points, so further technical
> > arguments are welcome :-}
> > On Thu, Oct 10, 2013 at 7:51 PM, Michael Segel <
> [EMAIL PROTECTED]>wrote:
> >> HBase in Action goes through great depth of showing you how you could
> >> implement GIS information in HBase.
> >> Unfortunately there are issues with Geohash and edge conditions which
> >> it difficult to use when you're dealing with data on an edge of a
> >> A better way would be to create a point (geospatial point object) and
> >> store it in a single column.
> >> (This goes beyond the example of what's in the book. ) And then index
> >> data by tiles.
> >> The downside is that you end up creating a lot more data…
> >> Take a look at some of the stuff Boris Lublinsky published on InfoQ.
> >> are also other articles on the net….
> >> On Oct 9, 2013, at 1:35 PM, Otis Gospodnetic <
> [EMAIL PROTECTED]>
> >> wrote:
> >>> The point is that there are options (multiple different hammers) if
> >>> HBase support for geospatial is not there or doesn't meet OP's needs.
> >>> Otis
> >>> --
> >>> Solr & ElasticSearch Support -- http://sematext.com/
> >>> Performance Monitoring -- http://sematext.com/spm
> >>> On Wed, Oct 9, 2013 at 11:14 AM, Michael Segel
> >>> <[EMAIL PROTECTED]> wrote:
> >>>> And Solr has what to do with storing data in HBase?
> >>>> I guess its true… if all you have is a hammer…
> >>>> The point I was raising was that geohash isn't the most efficient way
> >> to go when you look at the problem at a global level…
> >>>> On Oct 9, 2013, at 9:34 AM, Otis Gospodnetic <
> >> [EMAIL PROTECTED]> wrote: