|
Bob.Thorman@...
2012-08-28, 13:26
John Armstrong
2012-08-28, 13:33
William Slacum
2012-08-28, 16:02
Billie Rinaldi
2012-08-28, 16:45
Bob.Thorman@...
2012-08-28, 16:47
Bob.Thorman@...
2012-08-28, 16:51
John Armstrong
2012-08-28, 16:55
William Slacum
2012-08-28, 17:02
Billie Rinaldi
2012-08-28, 18:04
|
-
TimeSpan IteratorBob.Thorman@... 2012-08-28, 13:26
Does anyone know of a TimeSpan Iterator that will fetch rows based on
the accumulo timestamp? Bob Thorman Engineering Fellow L-3 Communications, ComCept 1700 Science Place Rockwall, TX 75032 (972) 772-7501 work [EMAIL PROTECTED] [EMAIL PROTECTED]
-
Re: TimeSpan IteratorJohn Armstrong 2012-08-28, 13:33
On 08/28/2012 09:26 AM, [EMAIL PROTECTED] wrote:
> Does anyone know of a TimeSpan Iterator that will fetch rows based on > the accumulo timestamp? We actually wrote our own TimestampRangeIterator and TimestampSetIterator classes. I don't know if 1.4 has any in the core libraries. It's not very hard though.
-
Re: TimeSpan IteratorWilliam Slacum 2012-08-28, 16:02
I think you'd probably just want to set a filter, because there may not be
any relationship between an arbitrary key and the timestamp set on it. On Tue, Aug 28, 2012 at 9:33 AM, John Armstrong <[EMAIL PROTECTED]> wrote: > On 08/28/2012 09:26 AM, [EMAIL PROTECTED] wrote: > >> Does anyone know of a TimeSpan Iterator that will fetch rows based on >> the accumulo timestamp? >> > > We actually wrote our own TimestampRangeIterator and TimestampSetIterator > classes. I don't know if 1.4 has any in the core libraries. It's not very > hard though. >
-
Re: TimeSpan IteratorBillie Rinaldi 2012-08-28, 16:45
On Tue, Aug 28, 2012 at 6:33 AM, John Armstrong <[EMAIL PROTECTED]> wrote:
> On 08/28/2012 09:26 AM, [EMAIL PROTECTED] wrote: > >> Does anyone know of a TimeSpan Iterator that will fetch rows based on >> the accumulo timestamp? >> > > We actually wrote our own TimestampRangeIterator and TimestampSetIterator > classes. I don't know if 1.4 has any in the core libraries. It's not very > hard though. > There's a TimestampFilter in org.apache.accumulo.core.iterators.user in 1.4. It uses a range of timestamps. Users should be aware that this is not an efficient operation, though. Billie
-
RE: TimeSpan IteratorBob.Thorman@... 2012-08-28, 16:47
You're right there is no relationship. But for every rowid I want only
those within a particular timespan. So the columns below I would want to specify a timespan (e.g. 123456786-123456788) and get only those three rows back. 12345 CF1:CQ1 [public] 123456789 : Value1 12345 CF1:CQ1 [public] 123456788 : Value1 12345 CF1:CQ1 [public] 123456787 : Value1 12345 CF1:CQ1 [public] 123456786 : Value1 12345 CF1:CQ1 [public] 123456785 : Value1 From: William Slacum [mailto:[EMAIL PROTECTED]] Sent: Tuesday, August 28, 2012 11:02 To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: TimeSpan Iterator I think you'd probably just want to set a filter, because there may not be any relationship between an arbitrary key and the timestamp set on it. On Tue, Aug 28, 2012 at 9:33 AM, John Armstrong <[EMAIL PROTECTED]> wrote: On 08/28/2012 09:26 AM, [EMAIL PROTECTED] wrote: Does anyone know of a TimeSpan Iterator that will fetch rows based on the accumulo timestamp? We actually wrote our own TimestampRangeIterator and TimestampSetIterator classes. I don't know if 1.4 has any in the core libraries. It's not very hard though.
-
RE: TimeSpan IteratorBob.Thorman@... 2012-08-28, 16:51
Billie
Your comment "Users should be aware that this is not an efficient operation, though." may help me decide if my current use of a secondary time index is better then. Right now I maintain a table that has timestamps as the rowid whose values are the rowid in a metadata table. Therefore I do one range scan based on the timestamp. Then a second lookup of the metadata rowid. Is this more efficient? From: Billie Rinaldi [mailto:[EMAIL PROTECTED]] Sent: Tuesday, August 28, 2012 11:46 To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: TimeSpan Iterator On Tue, Aug 28, 2012 at 6:33 AM, John Armstrong <[EMAIL PROTECTED]> wrote: On 08/28/2012 09:26 AM, [EMAIL PROTECTED] wrote: Does anyone know of a TimeSpan Iterator that will fetch rows based on the accumulo timestamp? We actually wrote our own TimestampRangeIterator and TimestampSetIterator classes. I don't know if 1.4 has any in the core libraries. It's not very hard though. There's a TimestampFilter in org.apache.accumulo.core.iterators.user in 1.4. It uses a range of timestamps. Users should be aware that this is not an efficient operation, though. Billie
-
Re: TimeSpan IteratorJohn Armstrong 2012-08-28, 16:55
On 08/28/2012 12:47 PM, [EMAIL PROTECTED] wrote:
> You�re right there is no relationship. But for every rowid I want only those within a particular > timespan. So the columns below I would want to specify a timespan (e.g. 123456786-123456788) and > get only those three rows back. > > *From:*William Slacum [mailto:[EMAIL PROTECTED]] > >> I think you'd probably just want to set a filter, because there may not be any relationship between >> an arbitrary key and the timestamp set on it. I think the point is that subclassing org.apache.accumulo.core.iterators.Filter will get most of the behavior you want in your Iterator; you just need to fill in what "accept(Key k, Value v)" means to your Filter.
-
Re: TimeSpan IteratorWilliam Slacum 2012-08-28, 17:02
It could be inefficient if you can't narrow down your search of a table to
specific ranges, via an index or some hint stored in the key that you can use to seek() around. You're left with doing an exhaustive search of the data, even if clients will only see data that matches your filtering criteria. On Tue, Aug 28, 2012 at 12:51 PM, <[EMAIL PROTECTED]> wrote: > Billie**** > > ** ** > > Your comment “Users should be aware that this is not an efficient > operation, though.” may help me decide if my current use of a secondary > time index is better then. Right now I maintain a table that has > timestamps as the rowid whose values are the rowid in a metadata table. > Therefore I do one range scan based on the timestamp. Then a second lookup > of the metadata rowid. Is this more efficient? **** > > ** ** > > *From:* Billie Rinaldi [mailto:[EMAIL PROTECTED]] > *Sent:* Tuesday, August 28, 2012 11:46 > > *To:* [EMAIL PROTECTED]; [EMAIL PROTECTED] > *Subject:* Re: TimeSpan Iterator**** > > ** ** > > On Tue, Aug 28, 2012 at 6:33 AM, John Armstrong <[EMAIL PROTECTED]> wrote:**** > > On 08/28/2012 09:26 AM, [EMAIL PROTECTED] wrote:**** > > Does anyone know of a TimeSpan Iterator that will fetch rows based on > the accumulo timestamp?**** > > ** ** > > We actually wrote our own TimestampRangeIterator and TimestampSetIterator > classes. I don't know if 1.4 has any in the core libraries. It's not very > hard though.**** > > > There's a TimestampFilter in org.apache.accumulo.core.iterators.user in > 1.4. It uses a range of timestamps. Users should be aware that this is > not an efficient operation, though. > > Billie**** >
-
Re: TimeSpan IteratorBillie Rinaldi 2012-08-28, 18:04
On Tue, Aug 28, 2012 at 9:51 AM, <[EMAIL PROTECTED]> wrote:
> Billie**** > > ** ** > > Your comment “Users should be aware that this is not an efficient > operation, though.” may help me decide if my current use of a secondary > time index is better then. Right now I maintain a table that has > timestamps as the rowid whose values are the rowid in a metadata table. > Therefore I do one range scan based on the timestamp. Then a second lookup > of the metadata rowid. Is this more efficient? > It probably depends on what percentage of the data you're bringing back, as compared to the amount you're scanning over (if that's not the whole table). I would hypothesize if you're bringing more than N% of the data back, you might as well just use the TimestampFilter on the main table. If you're bringing a smaller percentage back, it could be better to reduce the amount of the main table you have to scan over by maintaining a secondary time index. I'm not sure what N would be. You should also make sure that the secondary index is actually reducing the amount of the main table you're scanning over, e.g. if each rowid had a full range of timestamps, you could be pulling a list of all rowids back from the index table and not reducing the scan over the main table. Also, the TimestampFilter is not optimized. Filters evaluate each key/value pair to see if it is accepted (in this case, if it is in a timestamp range). If there are a lot of timestamps for each cell (keys that are identical except for timestamp), it would be better to use seeking instead. That would involve writing a new iterator. If there aren't many timestamps for each cell, seeking won't help and the TimestampFilter will be fine. Billie > ** ** > > *From:* Billie Rinaldi [mailto:[EMAIL PROTECTED]] > *Sent:* Tuesday, August 28, 2012 11:46 > > *To:* [EMAIL PROTECTED]; [EMAIL PROTECTED] > *Subject:* Re: TimeSpan Iterator**** > > ** ** > > On Tue, Aug 28, 2012 at 6:33 AM, John Armstrong <[EMAIL PROTECTED]> wrote:**** > > On 08/28/2012 09:26 AM, [EMAIL PROTECTED] wrote:**** > > Does anyone know of a TimeSpan Iterator that will fetch rows based on > the accumulo timestamp?**** > > ** ** > > We actually wrote our own TimestampRangeIterator and TimestampSetIterator > classes. I don't know if 1.4 has any in the core libraries. It's not very > hard though.**** > > > There's a TimestampFilter in org.apache.accumulo.core.iterators.user in > 1.4. It uses a range of timestamps. Users should be aware that this is > not an efficient operation, though. > > Billie**** > |