Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Using Iterator To Toss Unchanged Values


Copy link to this message
-
Re: Using Iterator To Toss Unchanged Values
NICE!

On Thu, Jul 12, 2012 at 11:47 AM, Billie J Rinaldi <
[EMAIL PROTECTED]> wrote:

> On Thursday, July 12, 2012 8:47:41 AM, "David Medinets" <
> [EMAIL PROTECTED]> wrote:
> > I'd like to track field level changes for a given record (say,
> > author). So I create a table without a VersioningIterator. And I
> > insert a few records:
> >
> > insert "JOHN" "ATTRIBUTE" "AGE" "34"
> > insert "JOHN" "ATTRIBUTE" "HEIGHT" "67"
> > insert "JOHN" "BOOKS" "TITLE" "THE RISE OF ACCUMULO"
> >
> > The next action is that some ingest process happens and does this:
> >
> > insert "JOHN" "ATTRIBUTE" "AGE" "34"
> >
> > Since there is no VersioningIterator, there are two AGES both with
> > "34" as the value.
> >
> > I would like an DropUnchangedValueIterator which removes the last
> > inserted record. Removing the last record lets me use the n-1
> > timestamp as a LastUpdated value for the key-value pair. But as soon
> > as a record is deleted, the previous records are not available
> > anymore? What if the timestamp is set to MAX-timestamp so the records
> > are sorted backwards? Does that avoid the blocking tombstones? I'd
> > look at the source code before asking but I don't have that luxury for
> > the next week or two and the question is rattling around my head.
>
> This is mixing the idea of a deletion entry, which removes all earlier
> entries, and the the idea that iterators can arbitrarily filter out
> entries.  I don't think reversing the timestamp will help you much in this
> case; what you want is an iterator that does pairwise comparisons of
> entries, and if the values are the same keep one entry with the earlier
> timestamp (then keep comparing entries for that record), and if the values
> are different keep one entry with the later timestamp (then skip to the
> next record).  I think you'll have to write a custom iterator for that.
>
> Billie
>
>
> > Naturally, I could query the database before the ingest insert. But,
> > referring to slide 19 in Adam's presentation at
> > http://people.apache.org/~afuchs/slides/accumulo_table_design.pdf, the
> > read-modify-write design is not optimal.
>

--
Corey Nolet
Senior Software Engineer
TexelTek, inc.
[Office] 301.880.7123
[Cell] 410-903-2110