Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> RE: [External]  Re: Storing, Indexing, and Querying data in Accumulo (geo + timeseries)


Copy link to this message
-
Re: [External]  Re: Storing, Indexing, and Querying data in Accumulo (geo + timeseries)

To clarify: By 'geologic', I was referring to time-scale (like 100s of
millions of years, with more detail near present, suggesting a log scale).

Your use of id is surprising. Maybe I don't understand what you're
trying to do.
 From what I was thinking, since you made reference to time-series, no
efficiency is gained through this id. If, instead the id were for a
whole time-series, and not each individual point then for each
timestamp, you would have X(id, timestamp), Y(id, timestamp) and
whatever else (id, timestamp) already organized as time series. ... all
with the same row id.
bithash+id, INDEX, id, ... - (query to get a list of IDs intersecting
your space-time region)
id, POSITION, XY, vis, TIMESTAMP, (x,y) - (use iterators to filter these
points)
id, MEAS, name, vis, TIMESTAMP, named_measurement

Alternately, if you wanted rich points, and not individual values:
bithash+id, INDEX, id, ... - (query to get a list of IDs intersecting
your space-time region)
id, SAMPLE, (x,y), vis, TIMESTAMP, sampleObject(JSON?) - (all in one column)

If this is way off base from what you are trying to do, please ignore.

Kurt

-----

On 6/18/13 10:14 PM, Iezzi, Adam [USA] wrote:
> All,
>
> Thank you for all of the replies. To answer some of the questions:
>
> Q: You say you have point data. Are time series geographically fixed, with only the time dimension changing? ... or are the time series moving in space-time?
> A: The time series will be moving in space-time; therefore, the dataset is geologic.
>
> Q: If you have time series (identified by<id>) moving in space-time, then I would add an indirection.
> A: Our dataset is very similar to what you describe. Each geospatial point and time stamp is defined by an id.  Since I'm new to the Accumulo world, I'm not very familiar with this pattern/approach in table design. But, I will look around now that I have some guidance.
>
> Overall, I think I need to create a space-time hash of my dataset, but the biggest question I have is, "what time span do I use?". At the moment, I only have a years' worth of data; therefore, my MIN_DATE = Jan 01 and MAX_DATE = Dec 31. But we obviously expect this data to continue to grow; therefore, would want to account for additional data in the future.
>
> Thanks again for all of the guidance. I will digest some of the comments and will report back.
>
> Adam
>
> -----Original Message-----
> From: Kurt Christensen [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, June 18, 2013 8:54 PM
> To: [EMAIL PROTECTED]
> Subject: [External] Re: Storing, Indexing, and Querying data in Accumulo (geo + timeseries)
>
>
> An effective optimization strategy will be largely influenced by the nature of your data.
>
> You say you have point data. Are time series geographically fixed, with only the time dimension changing? ... or are the time series moving in space-time?
>
> I was going to suggest a 3-D approach, bit-interleaving your space and time [modulo timespan] together ( or point-tree, or octtree, or k-d trie, or r-d trie ). The trick there is to pick a time span large enough so that any interval you query is small relative to the time span, but small enough so that you don't waste a bunch (up to an eighth) of your usable hash values with no useful time data (i.e. populate your most significant bits). This would work if your data were geographically fixed, but changing only in time. If your time span is geologic, you might want to use a logarithmic time scale.
>
> If you have time series (identified by<id>) moving in space-time, then I would add an indirection. Use the space-time hash to determine the IDs intersecting your zone and then query again, using the IDs to pull out the time series, filtering with your interator, perhaps using the native timestamp field.
>
> I hope that helps. Good luck.
>
> Kurt
>
> BTW: 50% filtering isn't really that inefficient. - kkc
>
>
> On 6/18/13 12:36 AM, Jared Winick wrote:
>    
>> Have you considered a "geohash" of all 3 dimensions together and using
Kurt Christensen
P.O. Box 811
Westminster, MD 21158-0811

"One of the penalties for refusing to participate in politics is that
you end up being governed by your inferiors."