|
|
-
querying for relevant rows
Lam 2012-06-29, 18:19
I'm using a timestamp as a key and the value is all the relevant data starting at that timestamp up to the timestamp represented by the key of the next row.
When querying, I'm given a time span, consisting of a start and stop time. I want to return all the relevant data within the time span, so I was to retrieve the appropriate rows (then filter the data for the given timespan).
Example: In Accumulo: (the format of the value is <letter>.<timestamp>) key=1 value= {a.1 b.1 c.2 d.2} key=3 value= {m.3 n.4 o.5} key=6 value={x.6 y.6 z.7}
Query: timespan=[2 4] (get all data from timestamp 2 to 4 inclusively)
Desire result: retrieve key=1 and key=3, then filter out a.1, b.1, and o.5, and return the rest
Problem: How do I know to retrieve key=1 and key=3 without scanning all the keys?
Can I create a scanner that looks for the given start key=2 and go to the prior row (i.e. key=1)?
-- D. Lam
-
Re: querying for relevant rows
William Slacum 2012-06-29, 18:50
You can use a BatchScanner and give it two ranges. It would look something like:
ArrayList<Range> ranges = new ArrayList<Range(); ranges.add(new Range(new Key(timestamp1))); ranges.add(new Range(new Key(timestamp2)));
BatchScanner bs = con.createBatchScanner(...);
//set your iterators and filters
bs.setRanges(ranges);
for(Entry<Key, Value> e : bs) { //your stuff }
On Fri, Jun 29, 2012 at 11:19 AM, Lam <[EMAIL PROTECTED]> wrote:
> I'm using a timestamp as a key and the value is all the relevant data > starting at that timestamp up to the timestamp represented by the key > of the next row. > > When querying, I'm given a time span, consisting of a start and stop > time. I want to return all the relevant data within the time span, so > I was to retrieve the appropriate rows (then filter the data for the > given timespan). > > Example: > In Accumulo: (the format of the value is <letter>.<timestamp>) > key=1 value= {a.1 b.1 c.2 d.2} > key=3 value= {m.3 n.4 o.5} > key=6 value={x.6 y.6 z.7} > > Query: timespan=[2 4] (get all data from timestamp 2 to 4 inclusively) > > Desire result: retrieve key=1 and key=3, then filter out a.1, b.1, and > o.5, and return the rest > > Problem: How do I know to retrieve key=1 and key=3 without scanning > all the keys? > > Can I create a scanner that looks for the given start key=2 and go to > the prior row (i.e. key=1)? > > -- > D. Lam >
-
Re: querying for relevant rows
Adam Fuchs 2012-06-29, 18:52
You can't scan backwards in Accumulo, but you probably don't need to. What you can do instead is use the last timestamp in the range as the key like this:
key=2 value= {a.1 b.1 c.2 d.2} key=5 value= {m.3 n.4 o.5} key=7 value={x.6 y.6 z.7}
As long as your ranges are non-overlapping, you can just stop when you get to the first key/value pair that starts after your given time range. If your ranges are overlapping then you will have to do a more complicated intersection between forward and reverse orderings to efficiently select ranges, or maybe use some type of hierarchical range intersection index akin to a binary space partitioning tree.
Cheers, Adam On Fri, Jun 29, 2012 at 2:19 PM, Lam <[EMAIL PROTECTED]> wrote:
> I'm using a timestamp as a key and the value is all the relevant data > starting at that timestamp up to the timestamp represented by the key > of the next row. > > When querying, I'm given a time span, consisting of a start and stop > time. I want to return all the relevant data within the time span, so > I was to retrieve the appropriate rows (then filter the data for the > given timespan). > > Example: > In Accumulo: (the format of the value is <letter>.<timestamp>) > key=1 value= {a.1 b.1 c.2 d.2} > key=3 value= {m.3 n.4 o.5} > key=6 value={x.6 y.6 z.7} > > Query: timespan=[2 4] (get all data from timestamp 2 to 4 inclusively) > > Desire result: retrieve key=1 and key=3, then filter out a.1, b.1, and > o.5, and return the rest > > Problem: How do I know to retrieve key=1 and key=3 without scanning > all the keys? > > Can I create a scanner that looks for the given start key=2 and go to > the prior row (i.e. key=1)? > > -- > D. Lam >
-
Re: querying for relevant rows
William Slacum 2012-06-29, 18:55
Oh, did I interpret this wrong? I originally thought all of the timestamps would be enumerated as rows, but after re-reading, I kind of get the idea that the rows are being used as markers in a skip list like fashion.
On Fri, Jun 29, 2012 at 11:52 AM, Adam Fuchs <[EMAIL PROTECTED]> wrote:
> You can't scan backwards in Accumulo, but you probably don't need to. What > you can do instead is use the last timestamp in the range as the key like > this: > > key=2 value= {a.1 b.1 c.2 d.2} > key=5 value= {m.3 n.4 o.5} > key=7 value={x.6 y.6 z.7} > > As long as your ranges are non-overlapping, you can just stop when you get > to the first key/value pair that starts after your given time range. If > your ranges are overlapping then you will have to do a more complicated > intersection between forward and reverse orderings to efficiently select > ranges, or maybe use some type of hierarchical range intersection index > akin to a binary space partitioning tree. > > Cheers, > Adam > > > > On Fri, Jun 29, 2012 at 2:19 PM, Lam <[EMAIL PROTECTED]> wrote: > >> I'm using a timestamp as a key and the value is all the relevant data >> starting at that timestamp up to the timestamp represented by the key >> of the next row. >> >> When querying, I'm given a time span, consisting of a start and stop >> time. I want to return all the relevant data within the time span, so >> I was to retrieve the appropriate rows (then filter the data for the >> given timespan). >> >> Example: >> In Accumulo: (the format of the value is <letter>.<timestamp>) >> key=1 value= {a.1 b.1 c.2 d.2} >> key=3 value= {m.3 n.4 o.5} >> key=6 value={x.6 y.6 z.7} >> >> Query: timespan=[2 4] (get all data from timestamp 2 to 4 inclusively) >> >> Desire result: retrieve key=1 and key=3, then filter out a.1, b.1, and >> o.5, and return the rest >> >> Problem: How do I know to retrieve key=1 and key=3 without scanning >> all the keys? >> >> Can I create a scanner that looks for the given start key=2 and go to >> the prior row (i.e. key=1)? >> >> -- >> D. Lam >> > >
-
Re: querying for relevant rows
Lam 2012-06-29, 19:01
This sounds like a good idea. But how do I scan forward -- do I set end=null in the following code? Scanner scan=conn.createScanner(tableName, auths);
Text start=new Text(Value.longToBytes(beginTimestamp)); Text end=new Text(Value.longToBytes(endTimestamp); scan.setRange(new Range(start, true, end, false));
for(Entry<Key,Value> e:scan) ... And is it efficient? i.e., the scanner won't move to the next entry until the next iteration through the for loop, right?
I'll run a test right now.
-- D. Lam On Fri, Jun 29, 2012 at 1:52 PM, Adam Fuchs <[EMAIL PROTECTED]> wrote: > You can't scan backwards in Accumulo, but you probably don't need to. What > you can do instead is use the last timestamp in the range as the key like > this: > > key=2 value= {a.1 b.1 c.2 d.2} > key=5 value= {m.3 n.4 o.5} > key=7 value={x.6 y.6 z.7} > > As long as your ranges are non-overlapping, you can just stop when you get > to the first key/value pair that starts after your given time range. If your > ranges are overlapping then you will have to do a more complicated > intersection between forward and reverse orderings to efficiently select > ranges, or maybe use some type of hierarchical range intersection index akin > to a binary space partitioning tree. > > Cheers, > Adam > > > > On Fri, Jun 29, 2012 at 2:19 PM, Lam <[EMAIL PROTECTED]> wrote: >> >> I'm using a timestamp as a key and the value is all the relevant data >> starting at that timestamp up to the timestamp represented by the key >> of the next row. >> >> When querying, I'm given a time span, consisting of a start and stop >> time. I want to return all the relevant data within the time span, so >> I was to retrieve the appropriate rows (then filter the data for the >> given timespan). >> >> Example: >> In Accumulo: (the format of the value is <letter>.<timestamp>) >> key=1 value= {a.1 b.1 c.2 d.2} >> key=3 value= {m.3 n.4 o.5} >> key=6 value={x.6 y.6 z.7} >> >> Query: timespan=[2 4] (get all data from timestamp 2 to 4 inclusively) >> >> Desire result: retrieve key=1 and key=3, then filter out a.1, b.1, and >> o.5, and return the rest >> >> Problem: How do I know to retrieve key=1 and key=3 without scanning >> all the keys? >> >> Can I create a scanner that looks for the given start key=2 and go to >> the prior row (i.e. key=1)? >> >> -- >> D. Lam > >
-
Re: querying for relevant rows
John Vines 2012-06-29, 19:14
If you set the end to null, it will go until the end of the table.
Scanners will bring back batches, default is 1000 key-value pairs. If you know you're only looking for a specifc number of Keys, you can drop the batch size to match you needs better. But if you end up grabbing multiple smaller batches, your performance time will be overcome with network overhead costs.
John
On Fri, Jun 29, 2012 at 3:02 PM, Lam <[EMAIL PROTECTED]> wrote:
> This sounds like a good idea. But how do I scan forward -- do I set > end=null in the following code? > > > Scanner scan=conn.createScanner(tableName, auths); > > Text start=new > Text(Value.longToBytes(beginTimestamp)); > Text end=new Text(Value.longToBytes(endTimestamp); > scan.setRange(new Range(start, true, end, false)); > > for(Entry<Key,Value> e:scan) ... > > > And is it efficient? i.e., the scanner won't move to the next entry > until the next iteration through the for loop, right? > > I'll run a test right now. > > -- > D. Lam > > > On Fri, Jun 29, 2012 at 1:52 PM, Adam Fuchs <[EMAIL PROTECTED]> wrote: > > You can't scan backwards in Accumulo, but you probably don't need to. > What > > you can do instead is use the last timestamp in the range as the key like > > this: > > > > key=2 value= {a.1 b.1 c.2 d.2} > > key=5 value= {m.3 n.4 o.5} > > key=7 value={x.6 y.6 z.7} > > > > As long as your ranges are non-overlapping, you can just stop when you > get > > to the first key/value pair that starts after your given time range. If > your > > ranges are overlapping then you will have to do a more complicated > > intersection between forward and reverse orderings to efficiently select > > ranges, or maybe use some type of hierarchical range intersection index > akin > > to a binary space partitioning tree. > > > > Cheers, > > Adam > > > > > > > > On Fri, Jun 29, 2012 at 2:19 PM, Lam <[EMAIL PROTECTED]> wrote: > >> > >> I'm using a timestamp as a key and the value is all the relevant data > >> starting at that timestamp up to the timestamp represented by the key > >> of the next row. > >> > >> When querying, I'm given a time span, consisting of a start and stop > >> time. I want to return all the relevant data within the time span, so > >> I was to retrieve the appropriate rows (then filter the data for the > >> given timespan). > >> > >> Example: > >> In Accumulo: (the format of the value is <letter>.<timestamp>) > >> key=1 value= {a.1 b.1 c.2 d.2} > >> key=3 value= {m.3 n.4 o.5} > >> key=6 value={x.6 y.6 z.7} > >> > >> Query: timespan=[2 4] (get all data from timestamp 2 to 4 inclusively) > >> > >> Desire result: retrieve key=1 and key=3, then filter out a.1, b.1, and > >> o.5, and return the rest > >> > >> Problem: How do I know to retrieve key=1 and key=3 without scanning > >> all the keys? > >> > >> Can I create a scanner that looks for the given start key=2 and go to > >> the prior row (i.e. key=1)? > >> > >> -- > >> D. Lam > > > > >
-
Re: querying for relevant rows
Lam 2012-06-29, 19:28
Thanks to all! I like this solution. I confirmed what you said and will use scanner.setBatchSize() as appropriate. -- D. Lam On Fri, Jun 29, 2012 at 2:14 PM, John Vines <[EMAIL PROTECTED]> wrote: > If you set the end to null, it will go until the end of the table. > > Scanners will bring back batches, default is 1000 key-value pairs. If you > know you're only looking for a specifc number of Keys, you can drop the > batch size to match you needs better. But if you end up grabbing multiple > smaller batches, your performance time will be overcome with network > overhead costs. > > John > > On Fri, Jun 29, 2012 at 3:02 PM, Lam <[EMAIL PROTECTED]> wrote: >> >> This sounds like a good idea. But how do I scan forward -- do I set >> end=null in the following code? >> >> >> Scanner scan=conn.createScanner(tableName, auths); >> >> Text start=new >> Text(Value.longToBytes(beginTimestamp)); >> Text end=new Text(Value.longToBytes(endTimestamp); >> scan.setRange(new Range(start, true, end, false)); >> >> for(Entry<Key,Value> e:scan) ... >> >> >> And is it efficient? i.e., the scanner won't move to the next entry >> until the next iteration through the for loop, right? >> >> I'll run a test right now. >> >> -- >> D. Lam >> >> >> On Fri, Jun 29, 2012 at 1:52 PM, Adam Fuchs <[EMAIL PROTECTED]> wrote: >> > You can't scan backwards in Accumulo, but you probably don't need to. >> > What >> > you can do instead is use the last timestamp in the range as the key >> > like >> > this: >> > >> > key=2 value= {a.1 b.1 c.2 d.2} >> > key=5 value= {m.3 n.4 o.5} >> > key=7 value={x.6 y.6 z.7} >> > >> > As long as your ranges are non-overlapping, you can just stop when you >> > get >> > to the first key/value pair that starts after your given time range. If >> > your >> > ranges are overlapping then you will have to do a more complicated >> > intersection between forward and reverse orderings to efficiently select >> > ranges, or maybe use some type of hierarchical range intersection index >> > akin >> > to a binary space partitioning tree. >> > >> > Cheers, >> > Adam >> > >> > >> > >> > On Fri, Jun 29, 2012 at 2:19 PM, Lam <[EMAIL PROTECTED]> wrote: >> >> >> >> I'm using a timestamp as a key and the value is all the relevant data >> >> starting at that timestamp up to the timestamp represented by the key >> >> of the next row. >> >> >> >> When querying, I'm given a time span, consisting of a start and stop >> >> time. I want to return all the relevant data within the time span, so >> >> I was to retrieve the appropriate rows (then filter the data for the >> >> given timespan). >> >> >> >> Example: >> >> In Accumulo: (the format of the value is <letter>.<timestamp>) >> >> key=1 value= {a.1 b.1 c.2 d.2} >> >> key=3 value= {m.3 n.4 o.5} >> >> key=6 value={x.6 y.6 z.7} >> >> >> >> Query: timespan=[2 4] (get all data from timestamp 2 to 4 >> >> inclusively) >> >> >> >> Desire result: retrieve key=1 and key=3, then filter out a.1, b.1, and >> >> o.5, and return the rest >> >> >> >> Problem: How do I know to retrieve key=1 and key=3 without scanning >> >> all the keys? >> >> >> >> Can I create a scanner that looks for the given start key=2 and go to >> >> the prior row (i.e. key=1)? >> >> >> >> -- >> >> D. Lam >> > >> > > >
|
|