|
|
Benson Margulies 2012-02-09, 14:43
At time 0, I make a Mutation with put("a", "b", "c");
At time 1, I do it again.
Do I get:
a) two copies of the same data with different timestamps?
b) an error?
c) something else?
If the idea I'm looking for is to end up with one item without doing a scan each time to see if it's out there, is there a 'garbage collection' cliche for cleaning out redundant items that differ only in timestamp?
-
Re: 'Redundant' mutations
Aaron Cordova 2012-02-09, 14:47
You get "a"
By default tables are configured with a "versioning iterator" that filters out all but the latest "version" of a key, meaning the key with the latest timestamp, which provides the cleaning out of redundant keys that differ only in timestamp behavior you describe. On Feb 9, 2012, at 9:43 AM, Benson Margulies wrote:
> At time 0, I make a Mutation with put("a", "b", "c"); > > At time 1, I do it again. > > Do I get: > > a) two copies of the same data with different timestamps? > > b) an error? > > c) something else? > > If the idea I'm looking for is to end up with one item without doing a > scan each time to see if it's out there, is there a 'garbage > collection' cliche for cleaning out redundant items that differ only > in timestamp?
-
Re: 'Redundant' mutations
Benson Margulies 2012-02-09, 14:50
On Thu, Feb 9, 2012 at 9:47 AM, Aaron Cordova <[EMAIL PROTECTED]> wrote: > You get "a" > > By default tables are configured with a "versioning iterator" that filters out all but the latest "version" of a key, meaning the key with the latest timestamp, which provides the cleaning out of redundant keys that differ only in timestamp behavior you describe
I understood that the default was only to see the latest, but does disk space remain consumed with older ones until something happens, or does it clean out itself? . > > > On Feb 9, 2012, at 9:43 AM, Benson Margulies wrote: > >> At time 0, I make a Mutation with put("a", "b", "c"); >> >> At time 1, I do it again. >> >> Do I get: >> >> a) two copies of the same data with different timestamps? >> >> b) an error? >> >> c) something else? >> >> If the idea I'm looking for is to end up with one item without doing a >> scan each time to see if it's out there, is there a 'garbage >> collection' cliche for cleaning out redundant items that differ only >> in timestamp? >
-
Re: 'Redundant' mutations
Keith Turner 2012-02-09, 15:14
On Thu, Feb 9, 2012 at 9:50 AM, Benson Margulies <[EMAIL PROTECTED]> wrote: > On Thu, Feb 9, 2012 at 9:47 AM, Aaron Cordova <[EMAIL PROTECTED]> wrote: >> You get "a" >> >> By default tables are configured with a "versioning iterator" that filters out all but the latest "version" of a key, meaning the key with the latest timestamp, which provides the cleaning out of redundant keys that differ only in timestamp behavior you describe > > I understood that the default was only to see the latest, but does > disk space remain consumed with older ones until something happens, or > does it clean out itself? > . >> >> >> On Feb 9, 2012, at 9:43 AM, Benson Margulies wrote: >> >>> At time 0, I make a Mutation with put("a", "b", "c"); >>> >>> At time 1, I do it again. >>> >>> Do I get: >>> >>> a) two copies of the same data with different timestamps? >>> >>> b) an error? >>> >>> c) something else? >>> >>> If the idea I'm looking for is to end up with one item without doing a >>> scan each time to see if it's out there, is there a 'garbage >>> collection' cliche for cleaning out redundant items that differ only >>> in timestamp? >>
It depends on a few factors. * If the two mutations were written to the same in memory map, when it is minor compacted only one is written out. * If the two mutations were written to different in memory maps, then the data will be minor compacted to separate files. In this case it will not go away until a major compactions occurs (merges multiple files, controlled by the major compaction ratio). This can be caused by additional data being written or a user forcing major compaction on a table.
-
Re: 'Redundant' mutations
Aaron Cordova 2012-02-09, 15:20
short answer: yes on disk these redundant keys are removed eventually
On Feb 9, 2012, at 10:14 AM, Keith Turner wrote:
> On Thu, Feb 9, 2012 at 9:50 AM, Benson Margulies <[EMAIL PROTECTED]> wrote: >> On Thu, Feb 9, 2012 at 9:47 AM, Aaron Cordova <[EMAIL PROTECTED]> wrote: >>> You get "a" >>> >>> By default tables are configured with a "versioning iterator" that filters out all but the latest "version" of a key, meaning the key with the latest timestamp, which provides the cleaning out of redundant keys that differ only in timestamp behavior you describe >> >> I understood that the default was only to see the latest, but does >> disk space remain consumed with older ones until something happens, or >> does it clean out itself? >> . >>> >>> >>> On Feb 9, 2012, at 9:43 AM, Benson Margulies wrote: >>> >>>> At time 0, I make a Mutation with put("a", "b", "c"); >>>> >>>> At time 1, I do it again. >>>> >>>> Do I get: >>>> >>>> a) two copies of the same data with different timestamps? >>>> >>>> b) an error? >>>> >>>> c) something else? >>>> >>>> If the idea I'm looking for is to end up with one item without doing a >>>> scan each time to see if it's out there, is there a 'garbage >>>> collection' cliche for cleaning out redundant items that differ only >>>> in timestamp? >>> > > It depends on a few factors. > * If the two mutations were written to the same in memory map, when > it is minor compacted only one is written out. > * If the two mutations were written to different in memory maps, > then the data will be minor compacted to separate files. In this case > it will not go away until a major compactions occurs (merges multiple > files, controlled by the major compaction ratio). This can be caused > by additional data being written or a user forcing major compaction on > a table.
|
|