|
|
-
Using maxVersions=7 but rows disappear
David Medinets 2012-04-11, 02:16
I'm still thinking about how to use accumulo to calculate weekly moving averages. I thought that using the maxVersions settings might work to maintain the last 7 values. Then a program could simply sum the values of a given row. So this is what I did:
bin/accumulo shell -u root -p password > createtable rolling rolling> config -t rolling -s table.iterator.scan.vers.opt.maxVersions=7 rolling> insert row cf cq 1 rolling> insert row cf cq 2 rolling> insert row cf cq 3 rolling> insert row cf cq 4 rolling> insert row cf cq 5 rolling> insert row cf cq 6 rolling> insert row cf cq 7 rolling> insert row cf cq 8 rolling> scan row cf:cq [] 8 row cf:cq [] 7 row cf:cq [] 6 row cf:cq [] 5 row cf:cq [] 4 row cf:cq [] 3 row cf:cq [] 2
This is exactly what I wanted to see. So I wrote a simple scanner program to read the table. Then I did another scan:
rolling> scan row cf:cq [] 8
Where did the rest of the records go?
-
Re: Using maxVersions=7 but rows disappear
William Slacum 2012-04-11, 02:35
I don't think you'd necessarily need a an aggregator for that, although it doesn't seem like that's what you're doing here in the first place. Wouldn't it be easier to set a summation iterator that also keeps a count of of observations to do some server side math and then combine it all on the client? That way you can have a time series and to get weekly averages you just change your scan range. On Apr 10, 2012, at 10:16 PM, David Medinets wrote:
> I'm still thinking about how to use accumulo to calculate weekly > moving averages. I thought that using the maxVersions settings might > work to maintain the last 7 values. Then a program could simply sum > the values of a given row. So this is what I did: > > bin/accumulo shell -u root -p password >> createtable rolling > rolling> config -t rolling -s table.iterator.scan.vers.opt.maxVersions=7 > rolling> insert row cf cq 1 > rolling> insert row cf cq 2 > rolling> insert row cf cq 3 > rolling> insert row cf cq 4 > rolling> insert row cf cq 5 > rolling> insert row cf cq 6 > rolling> insert row cf cq 7 > rolling> insert row cf cq 8 > rolling> scan > row cf:cq [] 8 > row cf:cq [] 7 > row cf:cq [] 6 > row cf:cq [] 5 > row cf:cq [] 4 > row cf:cq [] 3 > row cf:cq [] 2 > > This is exactly what I wanted to see. So I wrote a simple scanner > program to read the table. Then I did another scan: > > rolling> scan > row cf:cq [] 8 > > Where did the rest of the records go?
-
Re: Using maxVersions=7 but rows disappear
Josh Elser 2012-04-11, 02:36
David,
I'd venture a guess that because you only set the scan maxVersions, when Accumulo minor compacted your 'rolling' table to flush those K/V pairs to disk, it deleted your first 6 versions that you saw when performing the scan.
You can determine if this is actually what happened by running your inserts below, and calling 'compact' on the table before performing the scan.
To fix this, try setting the same option with the minc and majc scope. Most likely (but don't quote me):
config -t rolling -s table.iterator.minc.vers.opt.maxVersions=7 config -t rolling -s table.iterator.majc.vers.opt.maxVersions=7
- Josh
On 4/10/2012 10:16 PM, David Medinets wrote: > I'm still thinking about how to use accumulo to calculate weekly > moving averages. I thought that using the maxVersions settings might > work to maintain the last 7 values. Then a program could simply sum > the values of a given row. So this is what I did: > > bin/accumulo shell -u root -p password >> createtable rolling > rolling> config -t rolling -s table.iterator.scan.vers.opt.maxVersions=7 > rolling> insert row cf cq 1 > rolling> insert row cf cq 2 > rolling> insert row cf cq 3 > rolling> insert row cf cq 4 > rolling> insert row cf cq 5 > rolling> insert row cf cq 6 > rolling> insert row cf cq 7 > rolling> insert row cf cq 8 > rolling> scan > row cf:cq [] 8 > row cf:cq [] 7 > row cf:cq [] 6 > row cf:cq [] 5 > row cf:cq [] 4 > row cf:cq [] 3 > row cf:cq [] 2 > > This is exactly what I wanted to see. So I wrote a simple scanner > program to read the table. Then I did another scan: > > rolling> scan > row cf:cq [] 8 > > Where did the rest of the records go?
-
Re: Using maxVersions=7 but rows disappear
David Medinets 2012-04-11, 02:51
I thought there were issues associated with doing mutations inside iterators?
On Tue, Apr 10, 2012 at 10:35 PM, William Slacum <[EMAIL PROTECTED]> wrote: > I don't think you'd necessarily need a an aggregator for that, although it doesn't seem like that's what you're doing here in the first place. Wouldn't it be easier to set a summation iterator that also keeps a count of of observations to do some server side math and then combine it all on the client? That way you can have a time series and to get weekly averages you just change your scan range. > On Apr 10, 2012, at 10:16 PM, David Medinets wrote: > >> I'm still thinking about how to use accumulo to calculate weekly >> moving averages. I thought that using the maxVersions settings might >> work to maintain the last 7 values. Then a program could simply sum >> the values of a given row. So this is what I did: >> >> bin/accumulo shell -u root -p password >>> createtable rolling >> rolling> config -t rolling -s table.iterator.scan.vers.opt.maxVersions=7 >> rolling> insert row cf cq 1 >> rolling> insert row cf cq 2 >> rolling> insert row cf cq 3 >> rolling> insert row cf cq 4 >> rolling> insert row cf cq 5 >> rolling> insert row cf cq 6 >> rolling> insert row cf cq 7 >> rolling> insert row cf cq 8 >> rolling> scan >> row cf:cq [] 8 >> row cf:cq [] 7 >> row cf:cq [] 6 >> row cf:cq [] 5 >> row cf:cq [] 4 >> row cf:cq [] 3 >> row cf:cq [] 2 >> >> This is exactly what I wanted to see. So I wrote a simple scanner >> program to read the table. Then I did another scan: >> >> rolling> scan >> row cf:cq [] 8 >> >> Where did the rest of the records go? >
-
Re: Using maxVersions=7 but rows disappear
Adam Fuchs 2012-04-11, 18:20
David,
In case of continuing confusion, I think it's best if you ignore Bill's suggestion for now and heed Josh's advice. Bill's suggestion might be an optimization to look at later on, but your initial approach seems sound.
Adam
On Tue, Apr 10, 2012 at 10:52 PM, David Medinets <[EMAIL PROTECTED]>wrote:
> I thought there were issues associated with doing mutations inside > iterators? > > On Tue, Apr 10, 2012 at 10:35 PM, William Slacum <[EMAIL PROTECTED]> > wrote: > > I don't think you'd necessarily need a an aggregator for that, although > it doesn't seem like that's what you're doing here in the first place. > Wouldn't it be easier to set a summation iterator that also keeps a count > of of observations to do some server side math and then combine it all on > the client? That way you can have a time series and to get weekly averages > you just change your scan range. > > On Apr 10, 2012, at 10:16 PM, David Medinets wrote: > > > >> I'm still thinking about how to use accumulo to calculate weekly > >> moving averages. I thought that using the maxVersions settings might > >> work to maintain the last 7 values. Then a program could simply sum > >> the values of a given row. So this is what I did: > >> > >> bin/accumulo shell -u root -p password > >>> createtable rolling > >> rolling> config -t rolling -s table.iterator.scan.vers.opt.maxVersions=7 > >> rolling> insert row cf cq 1 > >> rolling> insert row cf cq 2 > >> rolling> insert row cf cq 3 > >> rolling> insert row cf cq 4 > >> rolling> insert row cf cq 5 > >> rolling> insert row cf cq 6 > >> rolling> insert row cf cq 7 > >> rolling> insert row cf cq 8 > >> rolling> scan > >> row cf:cq [] 8 > >> row cf:cq [] 7 > >> row cf:cq [] 6 > >> row cf:cq [] 5 > >> row cf:cq [] 4 > >> row cf:cq [] 3 > >> row cf:cq [] 2 > >> > >> This is exactly what I wanted to see. So I wrote a simple scanner > >> program to read the table. Then I did another scan: > >> > >> rolling> scan > >> row cf:cq [] 8 > >> > >> Where did the rest of the records go? > > >
|
|