Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - Re: Using Accumulo To Calculate Seven Day Rolling Average


Copy link to this message
-
Re: Using Accumulo To Calculate Seven Day Rolling Average
Adam Fuchs 2012-05-19, 01:42
You could use a combiner for values that match the same day, and then roll
off whole days. This could be used along with a scan-time combiner to do
averages across multiple days.

Alternatively, s/day/hour/g or s/day/minute/g.

Exponentially weighted moving averages might also be cool to do in a
combiner:
http://en.wikipedia.org/wiki/Exponential_decay

Cheers,
Adam
On Fri, May 18, 2012 at 9:21 PM, David Medinets <[EMAIL PROTECTED]>wrote:

> I'm replying a little late but Combiners replace the original values.
> Therefore, I don't think they can be used to calculate the kind of
> rolling averages I am calculating. There are other kinds of moving
> averages that don't depend historical data but frankly I don't
> remember their names.
>
> On Thu, Apr 12, 2012 at 10:25 PM, Billie J Rinaldi
> <[EMAIL PROTECTED]> wrote:
> > You could alternatively use a Combiner like the following to calculate
> the average (though I haven't tested this bit of code).  You would
> configure this as a scan-time iterator (either a persistent scan iterator
> for the table, or attached to a particular Scanner) and would use the
> STRING encoding type of the LongCombiner.  Not that it would be necessarily
> better to use a Combiner to average together 7 things, but I thought it
> would make a good example.
> >
> > public class AveragingCombiner extends LongCombiner {
> >  @Override
> >  public Long typedReduce(Key key, Iterator<Long> iter) {
> >    long sum = 0;
> >    long count = 0;
> >    while (iter.hasNext()) {
> >      sum = safeAdd(sum, iter.next());
> >      count++;
> >    }
> >    return sum/count;
> >  }
> > }
> >
> > Billie
> >
> >
> > ----- Original Message -----
> >> From: "David Medinets" <[EMAIL PROTECTED]>
> >> To: [EMAIL PROTECTED]
> >> Sent: Wednesday, April 11, 2012 10:59:46 PM
> >> Subject: Using Accumulo To Calculate Seven Day Rolling Average
> >> Thanks. Using this technique seems to work. I wrote a blog entry to
> >> document it:
> >>
> >> Using Accumulo To Calculate Seven Day Rolling Average
> >>
> http://affy.blogspot.com/2012/04/using-accumulo-to-calculate-seven-day.html
> >>
> >> On Wed, Apr 11, 2012 at 2:20 PM, Adam Fuchs <[EMAIL PROTECTED]>
> >> wrote:
> >> > David,
> >> >
> >> > In case of continuing confusion, I think it's best if you ignore
> >> > Bill's
> >> > suggestion for now and heed Josh's advice. Bill's suggestion might
> >> > be an
> >> > optimization to look at later on, but your initial approach seems
> >> > sound.
> >> >
> >> > Adam
> >> >
> >> >
> >> >
> >> > On Tue, Apr 10, 2012 at 10:52 PM, David Medinets
> >> > <[EMAIL PROTECTED]>
> >> > wrote:
> >> >>
> >> >> I thought there were issues associated with doing mutations inside
> >> >> iterators?
> >> >>
> >> >> On Tue, Apr 10, 2012 at 10:35 PM, William Slacum
> >> >> <[EMAIL PROTECTED]>
> >> >> wrote:
> >> >> > I don't think you'd necessarily need a an aggregator for that,
> >> >> > although
> >> >> > it doesn't seem like that's what you're doing here in the first
> >> >> > place.
> >> >> > Wouldn't it be easier to set a summation iterator that also keeps
> >> >> > a count of
> >> >> > of observations to do some server side math and then combine it
> >> >> > all on the
> >> >> > client? That way you can have a time series and to get weekly
> >> >> > averages you
> >> >> > just change your scan range.
> >> >> > On Apr 10, 2012, at 10:16 PM, David Medinets wrote:
> >> >> >
> >> >> >> I'm still thinking about how to use accumulo to calculate weekly
> >> >> >> moving averages. I thought that using the maxVersions settings
> >> >> >> might
> >> >> >> work to maintain the last 7 values. Then a program could simply
> >> >> >> sum
> >> >> >> the values of a given row. So this is what I did:
> >> >> >>
> >> >> >> bin/accumulo shell -u root -p password
> >> >> >>> createtable rolling
> >> >> >> rolling> config -t rolling -s
> >> >> >> table.iterator.scan.vers.opt.maxVersions=7
> >> >> >> rolling> insert row cf cq 1
> >> >> >> rolling> insert row cf cq 2