Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Using Iterator To Toss Unchanged Values


+
David Medinets 2012-07-12, 12:47
+
William Slacum 2012-07-12, 13:02
+
Marc Parisi 2012-07-12, 13:24
+
Billie J Rinaldi 2012-07-12, 15:47
Copy link to this message
-
Re: Using Iterator To Toss Unchanged Values
NICE!

On Thu, Jul 12, 2012 at 11:47 AM, Billie J Rinaldi <
[EMAIL PROTECTED]> wrote:

> On Thursday, July 12, 2012 8:47:41 AM, "David Medinets" <
> [EMAIL PROTECTED]> wrote:
> > I'd like to track field level changes for a given record (say,
> > author). So I create a table without a VersioningIterator. And I
> > insert a few records:
> >
> > insert "JOHN" "ATTRIBUTE" "AGE" "34"
> > insert "JOHN" "ATTRIBUTE" "HEIGHT" "67"
> > insert "JOHN" "BOOKS" "TITLE" "THE RISE OF ACCUMULO"
> >
> > The next action is that some ingest process happens and does this:
> >
> > insert "JOHN" "ATTRIBUTE" "AGE" "34"
> >
> > Since there is no VersioningIterator, there are two AGES both with
> > "34" as the value.
> >
> > I would like an DropUnchangedValueIterator which removes the last
> > inserted record. Removing the last record lets me use the n-1
> > timestamp as a LastUpdated value for the key-value pair. But as soon
> > as a record is deleted, the previous records are not available
> > anymore? What if the timestamp is set to MAX-timestamp so the records
> > are sorted backwards? Does that avoid the blocking tombstones? I'd
> > look at the source code before asking but I don't have that luxury for
> > the next week or two and the question is rattling around my head.
>
> This is mixing the idea of a deletion entry, which removes all earlier
> entries, and the the idea that iterators can arbitrarily filter out
> entries.  I don't think reversing the timestamp will help you much in this
> case; what you want is an iterator that does pairwise comparisons of
> entries, and if the values are the same keep one entry with the earlier
> timestamp (then keep comparing entries for that record), and if the values
> are different keep one entry with the later timestamp (then skip to the
> next record).  I think you'll have to write a custom iterator for that.
>
> Billie
>
>
> > Naturally, I could query the database before the ingest insert. But,
> > referring to slide 19 in Adam's presentation at
> > http://people.apache.org/~afuchs/slides/accumulo_table_design.pdf, the
> > read-modify-write design is not optimal.
>

--
Corey Nolet
Senior Software Engineer
TexelTek, inc.
[Office] 301.880.7123
[Cell] 410-903-2110
+
David Medinets 2012-07-12, 19:09
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB