Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Spatial data posting in HBase


+
cto 2013-09-24, 11:15
+
Ted Yu 2013-09-24, 13:43
+
Adrien Mogenet 2013-10-04, 08:01
+
Michael Segel 2013-10-08, 00:20
+
Michael Segel 2013-10-09, 13:36
+
Enis Söztutar 2013-10-07, 23:12
+
Otis Gospodnetic 2013-10-09, 14:34
+
Michael Segel 2013-10-09, 15:14
+
Otis Gospodnetic 2013-10-09, 18:35
+
Michael Segel 2013-10-10, 17:51
+
Adrien Mogenet 2013-10-12, 07:34
+
Michael Segel 2013-10-12, 18:33
+
Nick Dimiduk 2013-10-12, 23:16
+
Adrien Mogenet 2013-10-13, 10:33
Copy link to this message
-
Re: Spatial data posting in HBase
Yes, you can..  but you're doing more work to calculate  the geohash when you don't have to.

On Oct 13, 2013, at 5:33 AM, Adrien Mogenet <[EMAIL PROTECTED]> wrote:

> This is also what I had in mind. Computing the neighbors and/or the higher
> level of a "tile" is a quite easy bit manipulation. Dealing with equator
> corner cases must not be considered as an issue.
>
>
> On Sun, Oct 13, 2013 at 1:16 AM, Nick Dimiduk <[EMAIL PROTECTED]> wrote:
>
>> You can treat a geohash of a fixed precision as a tile and calculate the
>> neighbors of that tile. This is precisely what I did in the chapter in
>> HBaseIA. In that way, it's no different than a tile system.
>>
>>
>> On Sat, Oct 12, 2013 at 11:33 AM, Michael Segel
>> <[EMAIL PROTECTED]>wrote:
>>
>>> Adrien,
>>>
>>> In terms of efficiency...
>>>
>>> A general solution that can be applied to all problems in all areas is
>>> going to be best.
>>> Geohash gets ugly when you're around the equator.  You can have two
>> points
>>> literally a couple of km away that would have two very different geo
>> hashes.
>>>
>>> So if you tile the globe, depending on the size of the tile, you
>> calculate
>>> the tile, its surrounding tiles (if necessary) and then sweep through the
>>> data to find your object.
>>>
>>> I'm not suggesting you not to use geohash, just that its not going to be
>>> the most efficient.
>>>
>>> Note that the the downside to tiling is that if you're doing a geospatial
>>> index... your data volume explodes because you are storing references to
>>> the data at different tile levels.
>>>
>>> Its a trade off.
>>>
>>>
>>>
>>> On Oct 12, 2013, at 2:34 AM, Adrien Mogenet <[EMAIL PROTECTED]>
>>> wrote:
>>>
>>>> Michael, don't you think Geohashes can be satisfying and well-suited
>> for
>>>> many cases anyway? Searching in a bounding box or arbitrary polygon is
>>> not
>>>> that heavy with Geohash, even on edge conditions. The biggest risk IMHO
>>> is
>>>> to have to deal with tons of invalid extra points if the geohash query
>> is
>>>> not accurate enough and your points distribution is very sparse so that
>>>> many points will be found in a geohash despite they don't respond to
>> your
>>>> query criteria.
>>>>
>>>> However, if your query embeds enough bits of precision, Geohashes offer
>>>> some nice guarantees for distributed databases and your queries should
>>>> remain efficient enough.
>>>>
>>>> Another worst case of course is to look for K-NN since Geohash is not a
>>>> real longest-common-prefix algorithm but once again, if your points
>>>> distribution is approximately well balanced, this works not that bad
>>>> without doing lots of recursive queries or fetching tons of useless
>> data
>>>> (but I do agree looking into your tiles would probably be more
>>> appropriate
>>>> in that case).
>>>>
>>>> I'm planning to write an article on that points, so further technical
>>>> arguments are welcome :-}
>>>>
>>>> On Thu, Oct 10, 2013 at 7:51 PM, Michael Segel <
>>> [EMAIL PROTECTED]>wrote:
>>>>
>>>>> HBase in Action goes through great depth of showing you how you could
>>>>> implement GIS information in HBase.
>>>>>
>>>>> Unfortunately there are issues with Geohash and edge conditions which
>>> make
>>>>> it difficult to use when you're dealing with data on an edge of a
>>> quadrant.
>>>>>
>>>>> A better way would be to create a point (geospatial point object) and
>>>>> store it in a single column.
>>>>> (This goes beyond the example of what's in the book. ) And then index
>>> the
>>>>> data by tiles.
>>>>>
>>>>>
>>>>> The downside is that you end up creating a lot more data…
>>>>>
>>>>> Take a look at some of the stuff Boris Lublinsky published on InfoQ.
>>> There
>>>>> are also other articles on the net….
>>>>>
>>>>> On Oct 9, 2013, at 1:35 PM, Otis Gospodnetic <
>>> [EMAIL PROTECTED]>
>>>>> wrote:
>>>>>
>>>>>> The point is that there are options (multiple different hammers) if
>>>>>> HBase support for geospatial is not there or doesn't meet OP's needs.
+
Nick Dimiduk 2013-10-10, 17:40
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB