Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> querying for relevant rows

Copy link to this message
Re: querying for relevant rows
You can't scan backwards in Accumulo, but you probably don't need to. What
you can do instead is use the last timestamp in the range as the key like

    key=2  value= {a.1 b.1 c.2 d.2}
    key=5  value= {m.3 n.4 o.5}
    key=7  value={x.6 y.6 z.7}

As long as your ranges are non-overlapping, you can just stop when you get
to the first key/value pair that starts after your given time range. If
your ranges are overlapping then you will have to do a more complicated
intersection between forward and reverse orderings to efficiently select
ranges, or maybe use some type of hierarchical range intersection index
akin to a binary space partitioning tree.

On Fri, Jun 29, 2012 at 2:19 PM, Lam <[EMAIL PROTECTED]> wrote:

> I'm using a timestamp as a key and the value is all the relevant data
> starting at that timestamp up to the timestamp represented by the key
> of the next row.
> When querying, I'm given a time span, consisting of a start and stop
> time.  I want to return all the relevant data within the time span, so
> I was to retrieve the appropriate rows (then filter the data for the
> given timespan).
> Example:
> In Accumulo:  (the format of the value is  <letter>.<timestamp>)
>     key=1  value= {a.1 b.1 c.2 d.2}
>     key=3  value= {m.3 n.4 o.5}
>     key=6  value={x.6 y.6 z.7}
> Query:  timespan=[2 4]  (get all data from timestamp 2 to 4 inclusively)
> Desire result: retrieve key=1 and key=3, then filter out a.1, b.1, and
> o.5, and return the rest
> Problem: How do I know to retrieve key=1 and key=3 without scanning
> all the keys?
> Can I create a scanner that looks for the given start key=2 and go to
> the prior row (i.e. key=1)?
> --
> D. Lam