Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> querying for relevant rows


Copy link to this message
-
Re: querying for relevant rows
Oh, did I interpret this wrong? I originally thought all of the timestamps
would be enumerated as rows, but after re-reading, I kind of get the idea
that the rows are being used as markers in a skip list like fashion.

On Fri, Jun 29, 2012 at 11:52 AM, Adam Fuchs <[EMAIL PROTECTED]> wrote:

> You can't scan backwards in Accumulo, but you probably don't need to. What
> you can do instead is use the last timestamp in the range as the key like
> this:
>
>     key=2  value= {a.1 b.1 c.2 d.2}
>     key=5  value= {m.3 n.4 o.5}
>     key=7  value={x.6 y.6 z.7}
>
> As long as your ranges are non-overlapping, you can just stop when you get
> to the first key/value pair that starts after your given time range. If
> your ranges are overlapping then you will have to do a more complicated
> intersection between forward and reverse orderings to efficiently select
> ranges, or maybe use some type of hierarchical range intersection index
> akin to a binary space partitioning tree.
>
> Cheers,
> Adam
>
>
>
> On Fri, Jun 29, 2012 at 2:19 PM, Lam <[EMAIL PROTECTED]> wrote:
>
>> I'm using a timestamp as a key and the value is all the relevant data
>> starting at that timestamp up to the timestamp represented by the key
>> of the next row.
>>
>> When querying, I'm given a time span, consisting of a start and stop
>> time.  I want to return all the relevant data within the time span, so
>> I was to retrieve the appropriate rows (then filter the data for the
>> given timespan).
>>
>> Example:
>> In Accumulo:  (the format of the value is  <letter>.<timestamp>)
>>     key=1  value= {a.1 b.1 c.2 d.2}
>>     key=3  value= {m.3 n.4 o.5}
>>     key=6  value={x.6 y.6 z.7}
>>
>> Query:  timespan=[2 4]  (get all data from timestamp 2 to 4 inclusively)
>>
>> Desire result: retrieve key=1 and key=3, then filter out a.1, b.1, and
>> o.5, and return the rest
>>
>> Problem: How do I know to retrieve key=1 and key=3 without scanning
>> all the keys?
>>
>> Can I create a scanner that looks for the given start key=2 and go to
>> the prior row (i.e. key=1)?
>>
>> --
>> D. Lam
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB