Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> TimeSpan Iterator


Copy link to this message
-
Re: TimeSpan Iterator
On Tue, Aug 28, 2012 at 9:51 AM, <[EMAIL PROTECTED]> wrote:

> Billie****
>
> ** **
>
> Your comment “Users should be aware that this is not an efficient
> operation, though.” may help me decide if my current use of a secondary
> time index is better then.  Right now I maintain a table that has
> timestamps as the rowid whose values are the rowid in a metadata table.
> Therefore I do one range scan based on the timestamp.  Then a second lookup
> of the metadata rowid.  Is this more efficient?
>

It probably depends on what percentage of the data you're bringing back, as
compared to the amount you're scanning over (if that's not the whole
table).  I would hypothesize if you're bringing more than N% of the data
back, you might as well just use the TimestampFilter on the main table.  If
you're bringing a smaller percentage back, it could be better to reduce the
amount of the main table you have to scan over by maintaining a secondary
time index.  I'm not sure what N would be.  You should also make sure that
the secondary index is actually reducing the amount of the main table
you're scanning over, e.g. if each rowid had a full range of timestamps,
you could be pulling a list of all rowids back from the index table and not
reducing the scan over the main table.

Also, the TimestampFilter is not optimized.  Filters evaluate each
key/value pair to see if it is accepted (in this case, if it is in a
timestamp range).  If there are a lot of timestamps for each cell (keys
that are identical except for timestamp), it would be better to use seeking
instead.  That would involve writing a new iterator.  If there aren't many
timestamps for each cell, seeking won't help and the TimestampFilter will
be fine.

Billie

> ** **
>
> *From:* Billie Rinaldi [mailto:[EMAIL PROTECTED]]
> *Sent:* Tuesday, August 28, 2012 11:46
>
> *To:* [EMAIL PROTECTED]; [EMAIL PROTECTED]
> *Subject:* Re: TimeSpan Iterator****
>
> ** **
>
> On Tue, Aug 28, 2012 at 6:33 AM, John Armstrong <[EMAIL PROTECTED]> wrote:****
>
> On 08/28/2012 09:26 AM, [EMAIL PROTECTED] wrote:****
>
> Does anyone know of a TimeSpan Iterator that will fetch rows based on
> the accumulo timestamp?****
>
> ** **
>
> We actually wrote our own TimestampRangeIterator and TimestampSetIterator
> classes.  I don't know if 1.4 has any in the core libraries.  It's not very
> hard though.****
>
>
> There's a TimestampFilter in org.apache.accumulo.core.iterators.user in
> 1.4.  It uses a range of timestamps.  Users should be aware that this is
> not an efficient operation, though.
>
> Billie****
>