Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - querying for relevant rows


Copy link to this message
-
Re: querying for relevant rows
Adam Fuchs 2012-06-29, 18:52
You can't scan backwards in Accumulo, but you probably don't need to. What
you can do instead is use the last timestamp in the range as the key like
this:

    key=2  value= {a.1 b.1 c.2 d.2}
    key=5  value= {m.3 n.4 o.5}
    key=7  value={x.6 y.6 z.7}

As long as your ranges are non-overlapping, you can just stop when you get
to the first key/value pair that starts after your given time range. If
your ranges are overlapping then you will have to do a more complicated
intersection between forward and reverse orderings to efficiently select
ranges, or maybe use some type of hierarchical range intersection index
akin to a binary space partitioning tree.

Cheers,
Adam
On Fri, Jun 29, 2012 at 2:19 PM, Lam <[EMAIL PROTECTED]> wrote:

> I'm using a timestamp as a key and the value is all the relevant data
> starting at that timestamp up to the timestamp represented by the key
> of the next row.
>
> When querying, I'm given a time span, consisting of a start and stop
> time.  I want to return all the relevant data within the time span, so
> I was to retrieve the appropriate rows (then filter the data for the
> given timespan).
>
> Example:
> In Accumulo:  (the format of the value is  <letter>.<timestamp>)
>     key=1  value= {a.1 b.1 c.2 d.2}
>     key=3  value= {m.3 n.4 o.5}
>     key=6  value={x.6 y.6 z.7}
>
> Query:  timespan=[2 4]  (get all data from timestamp 2 to 4 inclusively)
>
> Desire result: retrieve key=1 and key=3, then filter out a.1, b.1, and
> o.5, and return the rest
>
> Problem: How do I know to retrieve key=1 and key=3 without scanning
> all the keys?
>
> Can I create a scanner that looks for the given start key=2 and go to
> the prior row (i.e. key=1)?
>
> --
> D. Lam
>