Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Spatial data posting in HBase

Copy link to this message
Re: Spatial data posting in HBase
Yes, you can..  but you're doing more work to calculate  the geohash when you don't have to.

On Oct 13, 2013, at 5:33 AM, Adrien Mogenet <[EMAIL PROTECTED]> wrote:

> This is also what I had in mind. Computing the neighbors and/or the higher
> level of a "tile" is a quite easy bit manipulation. Dealing with equator
> corner cases must not be considered as an issue.
> On Sun, Oct 13, 2013 at 1:16 AM, Nick Dimiduk <[EMAIL PROTECTED]> wrote:
>> You can treat a geohash of a fixed precision as a tile and calculate the
>> neighbors of that tile. This is precisely what I did in the chapter in
>> HBaseIA. In that way, it's no different than a tile system.
>> On Sat, Oct 12, 2013 at 11:33 AM, Michael Segel
>> <[EMAIL PROTECTED]>wrote:
>>> Adrien,
>>> In terms of efficiency...
>>> A general solution that can be applied to all problems in all areas is
>>> going to be best.
>>> Geohash gets ugly when you're around the equator.  You can have two
>> points
>>> literally a couple of km away that would have two very different geo
>> hashes.
>>> So if you tile the globe, depending on the size of the tile, you
>> calculate
>>> the tile, its surrounding tiles (if necessary) and then sweep through the
>>> data to find your object.
>>> I'm not suggesting you not to use geohash, just that its not going to be
>>> the most efficient.
>>> Note that the the downside to tiling is that if you're doing a geospatial
>>> index... your data volume explodes because you are storing references to
>>> the data at different tile levels.
>>> Its a trade off.
>>> On Oct 12, 2013, at 2:34 AM, Adrien Mogenet <[EMAIL PROTECTED]>
>>> wrote:
>>>> Michael, don't you think Geohashes can be satisfying and well-suited
>> for
>>>> many cases anyway? Searching in a bounding box or arbitrary polygon is
>>> not
>>>> that heavy with Geohash, even on edge conditions. The biggest risk IMHO
>>> is
>>>> to have to deal with tons of invalid extra points if the geohash query
>> is
>>>> not accurate enough and your points distribution is very sparse so that
>>>> many points will be found in a geohash despite they don't respond to
>> your
>>>> query criteria.
>>>> However, if your query embeds enough bits of precision, Geohashes offer
>>>> some nice guarantees for distributed databases and your queries should
>>>> remain efficient enough.
>>>> Another worst case of course is to look for K-NN since Geohash is not a
>>>> real longest-common-prefix algorithm but once again, if your points
>>>> distribution is approximately well balanced, this works not that bad
>>>> without doing lots of recursive queries or fetching tons of useless
>> data
>>>> (but I do agree looking into your tiles would probably be more
>>> appropriate
>>>> in that case).
>>>> I'm planning to write an article on that points, so further technical
>>>> arguments are welcome :-}
>>>> On Thu, Oct 10, 2013 at 7:51 PM, Michael Segel <
>>> [EMAIL PROTECTED]>wrote:
>>>>> HBase in Action goes through great depth of showing you how you could
>>>>> implement GIS information in HBase.
>>>>> Unfortunately there are issues with Geohash and edge conditions which
>>> make
>>>>> it difficult to use when you're dealing with data on an edge of a
>>> quadrant.
>>>>> A better way would be to create a point (geospatial point object) and
>>>>> store it in a single column.
>>>>> (This goes beyond the example of what's in the book. ) And then index
>>> the
>>>>> data by tiles.
>>>>> The downside is that you end up creating a lot more data…
>>>>> Take a look at some of the stuff Boris Lublinsky published on InfoQ.
>>> There
>>>>> are also other articles on the net….
>>>>> On Oct 9, 2013, at 1:35 PM, Otis Gospodnetic <
>>>>> wrote:
>>>>>> The point is that there are options (multiple different hammers) if
>>>>>> HBase support for geospatial is not there or doesn't meet OP's needs.