Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Z-Curve/Hilbert Curve


Copy link to this message
-
Re: Z-Curve/Hilbert Curve
My first thought was just something simple for a first pass - lat/lon -> a
single lexicographic dimension -  as it would cover most basic use cases.
Precision (number of bits in encoding) could be an arg or a config
variable.  For WITHIN/INTERSECTS topological predicates, we need to
decompose the query geometry into the (possibly/probably non-contiguous) 1D
ranges that cover the region in question.  GeoMesa has an algorithm to
quickly perform this decomposition that computes the minimum number of
geohash prefixes at different resolutions to fully cover the query
polygon.  Basically, it recursively traverses through levels of geohash
resolutions, prioritizing rectangles that intersect the region and
discarding non-intersecting rectangles at the lowest precisions, to produce
a sequence of ranges to scan over.  Fully contained rectangles are
discovered at their lowest resolution at which point the algorithm pops the
stack and searches the next prioritized prefix.  I think something like
this would definitely need to be ported over and included in a lexicoder
implementation to make it useful.  Also, rather than materialize the entire
set of ranges in memory, we can either return a lazy iterator of prefixes
that can be fed into a scanner in batches or we can have a short-circuit
config that tunes the amount of slop that's tolerable and cuts off the
traversal at a certain level of precision.  GeoMesa uses something like the
former on attribute indexes to coordinate parallel scanners on separate
index and record tables.

Thoughts?  I'm inclined to keep the implementation to the bare minimum
necessary for the basic use cases (lat/lon and bbox queries) though I do
think a general dimensionality reducing lexicoder would be very useful.
On Fri, Jul 25, 2014 at 6:51 PM, Chris Bennight <[EMAIL PROTECTED]> wrote:
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB