THORMAN, ROBERT D
20140724, 14:13
William Slacum
20140724, 17:27
Russ Weeks
20140724, 17:31
Hunter Provyn
20140724, 19:06
Sean Busbey
20140725, 02:14
Anthony F
20140725, 15:45
Sean Busbey
20140725, 16:38
Chris Bennight
20140725, 22:52
Kurt Christensen
20140726, 12:59
Anthony Fox
20140727, 20:15
Jared Winick
20140728, 18:01
Corey Nolet
20140728, 21:11
Chris Bennight
20140729, 00:08
Chris Bennight
20140729, 04:12


ZCurve/Hilbert Curve
Can anyone share a Java method to convert lat/lon (decimal degrees) to ZCurve (string)? I need to order my geospatial data into lexical order.
v/r Bob Thorman Principal Big Data Engineer AT&T Big Data CoE 2900 W. Plano Parkway Plano, TX 75075 9726581714 
Re: ZCurve/Hilbert Curve
Quick google search yielded:
https://github.com/GeoLatte/geolattegeom/blob/master/src/main/java/org/geolatte/geom/curve/MortonCode.java On Thu, Jul 24, 2014 at 10:10 AM, THORMAN, ROBERT D <[EMAIL PROTECTED]> wrote: 
Re: ZCurve/Hilbert Curve
Does it have to be a Zcurve? The excellent Accumulo recipes project has a
very easytouse geospatial index based on a quadtree. https://github.com/calrissian/accumulorecipes/tree/master/store/geospatialstore/src/main/java/org/calrissian/accumulorecipes/geospatialstore/support Russ On Thu, Jul 24, 2014 at 10:17 AM, William Slacum < [EMAIL PROTECTED]> wrote: 
Re: ZCurve/Hilbert Curve
Bob,
There are a number of open source packages / collections of useful things out there you could use to generate Zcurves for accumulo: GeoMesa  http://geomesa.github.io/, https://github.com/locationtech/geomesa/ (Disclaimer: I'm a commiter to GeoMesa) Calrissian  https://github.com/calrissian/accumulorecipes/tree/master/store/geospatialstore GeoWave  https://github.com/ngageoint/geowave GeoMesa currently supports accumulo 1.4 and 1.5. It has a GeoServer plugin to facilitate analysis. Hunter On 07/24/2014 10:10 AM, THORMAN, ROBERT D wrote: 
Re: ZCurve/Hilbert Curve
Hi Hunter!
Any plans to update GeoMesa for 1.6? Could we convince y'all to contribute a Zcurve lexicoder for the core project? http://accumulo.apache.org/1.6/accumulo_user_manual.html#_lexicoders Sean On Jul 24, 2014 2:06 PM, "Hunter Provyn" <[EMAIL PROTECTED]> wrote: 
Re: ZCurve/Hilbert Curve
Sean,
GeoMesa plans to update to 1.6 as soon as we get past a 1.0 RELEASE (which is waiting on approval by the Eclipse Foundation's IP review process and should happen in a month or so). I think we could break out a zcurve lexicoder and contribute before then though. I'll poke around at the lexicoder stuff in 1.6 and see what it would take to use that interface now. Thanks, Anthony On Thu, Jul 24, 2014 at 10:14 PM, Sean Busbey <[EMAIL PROTECTED]> wrote: 
Re: ZCurve/Hilbert Curve
On Fri, Jul 25, 2014 at 8:45 AM, Anthony F <[EMAIL PROTECTED]> wrote:
That sounds great Anthony. Let us know if we can help with anything! Sean 
Re: ZCurve/Hilbert Curve
A couple of related issues come up when considering implementing a
dimensionality reducing encoding  just want to toss those out to see what people think the interface might look like. There's a couple of aspects that could be brought in here, but lets keep it simple and considering the original question: (lat/lon) > number. The more bits we add to the zcurve, the more precise our comparison  i.e. a 63 bit key would have more "locations" to sort by than a 24 bit key. Would you see a reasonable default getting picked, make this user configurable, or both? (i.e. default to a value, extended options with a new constructor?) I'm extrapolating here, but the only reason I see that locality matters is if we want to preserve locality for range searches. The internal implementation of the encoding/lexicoding process is going to directly impact the implementation of the range query. Now sure, someone could encode the lower left point, encode the upper right point, and construct a range out of that to pass for a scan, but that's going to be wildly inefficient in most cases. See: https://dl.dropboxusercontent.com/u/6649380/bbox.png If we just lexicode the lower left and upper right we traverse across the entire curve  hitting lots of areas that aren't actually in the original range. Now we can turn a single 2D range into a set of 1D ranges. There is some potential tuning here now, as the algorithm has a tradeoff on time to compute the ranges (and number of ranges) vs. "slop" (or inclusion of ranges which aren't actually in the original query). Would you see a static method perhaps on the zcurve lexicoder that returns a series of ranges based on an input window? Some other mechanism? And in the case of "slop"  would we just document that the ranges could actually include values not expected  or would we always fully decompose? The other, "making it more complicated" questions would probably resolve around generalizing this to a multidimensional lexicoder. I would expect the WGS84 (lat/long) encoder would just be a 2D instance with 180/+180 90/+90 bounds. There are probably completely different cases where it would be useful to have a locality sensitive hash. But with the big caveat that this requires more methods  I now need a way of defining a range for each of the dimensions (which before were defined as lat/lon), and I need an interface/class to pass those dimensions to the encoder  and should be able to use those same definitions to decode my values on the way out. I'm not trying to make a mountain out of a molehill  one could definitely pick some sane defaults, document behaviour, and put in a WGS84 specific implementation. I'm just thinking what's the minimum viable bit that's needed for this to be useful  and I suspect it's also the range decomposition piece  as I suspect *everyone* would really need that (if they cared about locality in the first place). My main question here would be to what extent would you see this going? A one off for WGS84, or a more generic multidimensional lexicoder; and in either case, what would the thoughts/consensus be for implementation of the additional methods (and possibly classes) required? On Fri, Jul 25, 2014 at 12:37 PM, Sean Busbey <[EMAIL PROTECTED]> wrote: 
Re: ZCurve/Hilbert CurveThe first thing that comes to my mind is bit shuffling. Scale and shift lat and lon into [0,1) ranges. That is from 0.000... to 0.111... Then, starting just past the decimal point, take the first lon bit (first since double the range of lat), then the first lat bit. Then just alternate, second bits, third bits, until you have as much precision as you like. There are discontinuities in the resulting curve, but any point is easy to encode and decode, and it provides a 1dimensional sort. I hope that helps. Kurt On 7/24/14 10:10 AM, THORMAN, ROBERT D wrote: Kurt Christensen P.O. Box 811 Westminster, MD 211580811 If you can't explain it simply, you don't understand it well enough.  Albert Einstein 
Re: ZCurve/Hilbert Curve
My first thought was just something simple for a first pass  lat/lon > a
single lexicographic dimension  as it would cover most basic use cases. Precision (number of bits in encoding) could be an arg or a config variable. For WITHIN/INTERSECTS topological predicates, we need to decompose the query geometry into the (possibly/probably noncontiguous) 1D ranges that cover the region in question. GeoMesa has an algorithm to quickly perform this decomposition that computes the minimum number of geohash prefixes at different resolutions to fully cover the query polygon. Basically, it recursively traverses through levels of geohash resolutions, prioritizing rectangles that intersect the region and discarding nonintersecting rectangles at the lowest precisions, to produce a sequence of ranges to scan over. Fully contained rectangles are discovered at their lowest resolution at which point the algorithm pops the stack and searches the next prioritized prefix. I think something like this would definitely need to be ported over and included in a lexicoder implementation to make it useful. Also, rather than materialize the entire set of ranges in memory, we can either return a lazy iterator of prefixes that can be fed into a scanner in batches or we can have a shortcircuit config that tunes the amount of slop that's tolerable and cuts off the traversal at a certain level of precision. GeoMesa uses something like the former on attribute indexes to coordinate parallel scanners on separate index and record tables. Thoughts? I'm inclined to keep the implementation to the bare minimum necessary for the basic use cases (lat/lon and bbox queries) though I do think a general dimensionality reducing lexicoder would be very useful. On Fri, Jul 25, 2014 at 6:51 PM, Chris Bennight <[EMAIL PROTECTED]> wrote: 
Re: ZCurve/Hilbert Curve
As several people have commented, a single range for a query can produce a
lot of false positives that need to be filtered out. I had made this visualization a while back that lets you interactively (clickdrag a bounding box) see this behavior. http://bl.ocks.org/jaredwinick/5073432 On Sun, Jul 27, 2014 at 2:15 PM, Anthony Fox <[EMAIL PROTECTED]> wrote: 
Re: ZCurve/Hilbert Curve
On Calrissian, we had once considered making a lexicoder for lat/long that
really transformed the twodimensions (lat/lon) down into a geohash based on the zcurve. The reason we decided against a first class datatype for this is exactly the same reason that Anthony brings up in his previous comment the geohash only makes sense as a query tool when you have a good way to structure (at least as many as possible) the nonsequential ranges you would need in order to find all the overlapping regions (with minimal as possible amount of falsepositives). On Mon, Jul 28, 2014 at 2:00 PM, Jared Winick <[EMAIL PROTECTED]> wrote: 
Re: ZCurve/Hilbert Curve
Anthony 
So that sounds reasonable  it's sounding like the consensus is a basic WGS84 (lat/long) type (point only), range query (lower left, upper right), and a decompose fully method (i.e. only intersecting ranges). (caveat here that we probably want to cap the precision here, as a full decomposition of, say, a 31/31 bit zcurve might take a long time). What's the performance of your query like for a 24bit/24bit (x,y) zcurve on a full decomposition over a significant portion of the range? I'm thinking in the interest of keeping it simple/easy it might be best just to fix the precision and document it. 24 bits at the equator would still give us a resolution of ~2m (should be worst case) which should be good enough for most things. (basically 0.00002 would be the max granularity for longitude). If it's fast enough I would just cap at 64 (well, maybe 63 for signed issues) total bits to to keep the key as a long = 31bits/31bits gives a longitudinal granularity of ~0.00000017  or 1.9cm at the equator. I think a full decomposition on a 31/31bit grid might be expensive. (the values above are assuming we normalization of the input value  which I think is a given for this implementation?) If we can fix the precision then that takes out most of the variables and it's pretty straightforward  I think decomp performance is the main limiting factor. Jared  Yep, exactly that  which would be unintuitive behaviour (Getting back ranges that didn't really match). That said, an inclusive only range set can be generated; you can model it as a quadtree decomposition and proceed recursively if your current region ("quad") isn't fully covered by your selection window  otherwise return the range of your current "quad". Which, I think, is essentially what Anthony described in his implementation (though with stack collection instead of recursive; probably for the best there). Corey  So you think the lexicoder would make sense/be reasonable if there is a performant range query tool to go with it (one that resulted in 0 false positives)? On Mon, Jul 28, 2014 at 5:10 PM, Corey Nolet <[EMAIL PROTECTED]> wrote: 
Re: ZCurve/Hilbert Curve
So I put together a quick mockup @
https://github.com/chrisbennight/wgs84lexicoder of what this might work like. It's functional, but I haven't done more than a cursory test, so caveat emptor. On the performance side, a full decomposition (no slop) (on a hilbert curve) crossed the 1 second mark at between 18 and 19 bits precision. This is on an i7920. At 18 bits an average error (where error is the deviation after an encode/decode cycle due to quantization effects) measurement for longitude was at 0.0006639 degrees, or roughly 73meters at the equator. This should be the worst error you will see latitude was on average ~ 1/2 the error by degrees (since it had a smaller range for the same number of bits  90 to +90 vs. 180 to +180). Anyway, this was just a simplistic stab to ground the discussion. The performance of the full decomposition was worse than I expected, so that probably places some constraints on what defaults should be. Here's the output of the test I wrote: 2 bits took 0.030 seconds for 1 ranges Actual latitude error: 30.7720588 Worst case avg longitude error: 60.3973510 3 bits took 0.003 seconds for 3 ranges Actual latitude error: 12.8098739 Worst case avg longitude error: 25.8845790 4 bits took 0.002 seconds for 6 ranges Actual latitude error: 5.3602941 Worst case avg longitude error: 10.8079470 5 bits took 0.003 seconds for 13 ranges Actual latitude error: 2.8071632 Worst case avg longitude error: 5.6910916 6 bits took 0.012 seconds for 46 ranges Actual latitude error: 1.3392857 Worst case avg longitude error: 2.7625355 7 bits took 0.025 seconds for 73 ranges Actual latitude error: 0.6852131 Worst case avg longitude error: 1.3516191 8 bits took 0.074 seconds for 313 ranges Actual latitude error: 0.3153114 Worst case avg longitude error: 0.6264122 9 bits took 0.037 seconds for 419 ranges Actual latitude error: 0.1702976 Worst case avg longitude error: 0.3452521 10 bits took 0.040 seconds for 910 ranges Actual latitude error: 0.0850656 Worst case avg longitude error: 0.1724573 11 bits took 0.040 seconds for 1906 ranges Actual latitude error: 0.0438051 Worst case avg longitude error: 0.0861865 12 bits took 0.035 seconds for 6335 ranges Actual latitude error: 0.0144635 Worst case avg longitude error: 0.0343498 13 bits took 0.034 seconds for 7957 ranges Actual latitude error: 0.0090083 Worst case avg longitude error: 0.0215387 14 bits took 0.070 seconds for 18518 ranges Actual latitude error: 0.0056349 Worst case avg longitude error: 0.0117874 15 bits took 0.132 seconds for 33670 ranges Actual latitude error: 0.0027366 Worst case avg longitude error: 0.0055297 16 bits took 0.333 seconds for 83506 ranges Actual latitude error: 0.0012269 Worst case avg longitude error: 0.0024738 17 bits took 0.651 seconds for 167546 ranges Actual latitude error: 0.0006639 Worst case avg longitude error: 0.0013460 18 bits took 0.773 seconds for 198597 ranges Actual latitude error: 0.0003219 Worst case avg longitude error: 0.0006639 19 bits took 1.329 seconds for 335470 ranges Actual latitude error: 0.0001660 Worst case avg longitude error: 0.0003274 20 bits took 5.268 seconds for 1049209 ranges Actual latitude error: 0.0000767 Worst case avg longitude error: 0.0001523 21 bits took 6.735 seconds for 1433559 ranges Actual latitude error: 0.0000415 Worst case avg longitude error: 0.0000841 22 bits took 29.073 seconds for 5371949 ranges Actual latitude error: 0.0000207 Worst case avg longitude error: 0.0000421 23 bits took 44.222 seconds for 7681223 ranges Actual latitude error: 0.0000107 Worst case avg longitude error: 0.0000210 24 bits took 72.172 seconds for 11490434 ranges Actual latitude error: 0.0000035 Worst case avg longitude error: 0.0000084 On Mon, Jul 28, 2014 at 7:59 PM, Chris Bennight <[EMAIL PROTECTED]> wrote: 

