-Re: Large-scale web analytics with Accumulo (and Nutch/Gora, Pig, and Storm)
Jason Trost 2012-11-03, 12:35
The iterator is used at scan time only so the counts are accurate. We
could have left out the event UUID and set the value = "1" in a normal use
of ingest (where ever record is guaranteed to only be ingested once).
Storm guarantees that event Tuple is processed at least once. For this
application we can't tolerate inaccurate counts. That's why we roll it up
at scan time. Does this make sense?
On Fri, Nov 2, 2012 at 11:45 PM, David Medinets <[EMAIL PROTECTED]>wrote:
> Unfortunately I had to leave the meetup during the middle of John's
> presentation to catch the ferry over to New Jersey. I wish I was able
> to stay. I am curious about slide 11 which describes ingest and a scan
> time iterator. What happens during compaction? And why not ingest
> directly into the "value = 1" format? I like the "group by fields" row
> id - the name so neatly encapsulates the concept.
> On Fri, Nov 2, 2012 at 9:43 PM, Jason Trost <[EMAIL PROTECTED]> wrote:
> > Large-scale web analytics with Accumulo (and Nutch/Gora, Pig, and Storm)
> > http://www.slideshare.net/jasontrost/accumulo-at-endgame