Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Re: Using Accumulo To Calculate Seven Day Rolling Average


Copy link to this message
-
Re: Using Accumulo To Calculate Seven Day Rolling Average
You could use a combiner for values that match the same day, and then roll
off whole days. This could be used along with a scan-time combiner to do
averages across multiple days.

Alternatively, s/day/hour/g or s/day/minute/g.

Exponentially weighted moving averages might also be cool to do in a
combiner:
http://en.wikipedia.org/wiki/Exponential_decay

Cheers,
Adam
On Fri, May 18, 2012 at 9:21 PM, David Medinets <[EMAIL PROTECTED]>wrote:

> I'm replying a little late but Combiners replace the original values.
> Therefore, I don't think they can be used to calculate the kind of
> rolling averages I am calculating. There are other kinds of moving
> averages that don't depend historical data but frankly I don't
> remember their names.
>
> On Thu, Apr 12, 2012 at 10:25 PM, Billie J Rinaldi
> <[EMAIL PROTECTED]> wrote:
> > You could alternatively use a Combiner like the following to calculate
> the average (though I haven't tested this bit of code).  You would
> configure this as a scan-time iterator (either a persistent scan iterator
> for the table, or attached to a particular Scanner) and would use the
> STRING encoding type of the LongCombiner.  Not that it would be necessarily
> better to use a Combiner to average together 7 things, but I thought it
> would make a good example.
> >
> > public class AveragingCombiner extends LongCombiner {
> >  @Override
> >  public Long typedReduce(Key key, Iterator<Long> iter) {
> >    long sum = 0;
> >    long count = 0;
> >    while (iter.hasNext()) {
> >      sum = safeAdd(sum, iter.next());
> >      count++;
> >    }
> >    return sum/count;
> >  }
> > }
> >
> > Billie
> >
> >
> > ----- Original Message -----
> >> From: "David Medinets" <[EMAIL PROTECTED]>
> >> To: [EMAIL PROTECTED]
> >> Sent: Wednesday, April 11, 2012 10:59:46 PM
> >> Subject: Using Accumulo To Calculate Seven Day Rolling Average
> >> Thanks. Using this technique seems to work. I wrote a blog entry to
> >> document it:
> >>
> >> Using Accumulo To Calculate Seven Day Rolling Average
> >>
> http://affy.blogspot.com/2012/04/using-accumulo-to-calculate-seven-day.html
> >>
> >> On Wed, Apr 11, 2012 at 2:20 PM, Adam Fuchs <[EMAIL PROTECTED]>
> >> wrote:
> >> > David,
> >> >
> >> > In case of continuing confusion, I think it's best if you ignore
> >> > Bill's
> >> > suggestion for now and heed Josh's advice. Bill's suggestion might
> >> > be an
> >> > optimization to look at later on, but your initial approach seems
> >> > sound.
> >> >
> >> > Adam
> >> >
> >> >
> >> >
> >> > On Tue, Apr 10, 2012 at 10:52 PM, David Medinets
> >> > <[EMAIL PROTECTED]>
> >> > wrote:
> >> >>
> >> >> I thought there were issues associated with doing mutations inside
> >> >> iterators?
> >> >>
> >> >> On Tue, Apr 10, 2012 at 10:35 PM, William Slacum
> >> >> <[EMAIL PROTECTED]>
> >> >> wrote:
> >> >> > I don't think you'd necessarily need a an aggregator for that,
> >> >> > although
> >> >> > it doesn't seem like that's what you're doing here in the first
> >> >> > place.
> >> >> > Wouldn't it be easier to set a summation iterator that also keeps
> >> >> > a count of
> >> >> > of observations to do some server side math and then combine it
> >> >> > all on the
> >> >> > client? That way you can have a time series and to get weekly
> >> >> > averages you
> >> >> > just change your scan range.
> >> >> > On Apr 10, 2012, at 10:16 PM, David Medinets wrote:
> >> >> >
> >> >> >> I'm still thinking about how to use accumulo to calculate weekly
> >> >> >> moving averages. I thought that using the maxVersions settings
> >> >> >> might
> >> >> >> work to maintain the last 7 values. Then a program could simply
> >> >> >> sum
> >> >> >> the values of a given row. So this is what I did:
> >> >> >>
> >> >> >> bin/accumulo shell -u root -p password
> >> >> >>> createtable rolling
> >> >> >> rolling> config -t rolling -s
> >> >> >> table.iterator.scan.vers.opt.maxVersions=7
> >> >> >> rolling> insert row cf cq 1
> >> >> >> rolling> insert row cf cq 2
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB