Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # dev - Re: [jira] [Commented] (ACCUMULO-227) Improve in memory map counts to provide cell level uniqueness for repeated columns in mutation


Copy link to this message
-
Re: [jira] [Commented] (ACCUMULO-227) Improve in memory map counts to provide cell level uniqueness for repeated columns in mutation
Aaron Cordova 2011-12-22, 21:49
I think it's fine to consider different versions of 'identical keys', meaning row,colfam,colqual, because in that case the implementation still treats two keys that only differ by timestamp as two unique keys. But I don't think we should allow multiple identical _versions_ of identical keys, to use your terminology. I think we should throw all but one away if the user does happen to try to insert them and if the user wants to aggregate across values, he or she must use different version numbers or timestamps or whatever.

If generating unique timestamps within mutations that want to perform several updates to the same row,colfam,colqual is a problem, why don't we allow the user to 'put()' multiple updates into a mutation, and on the server then assign slightly different timestamps to the identical row,colfam,colqual triples that are found in a mutation. Would that make everyone happy?

On Dec 22, 2011, at 4:35 PM, Keith Turner wrote:

> Big table has versions.  Does the big table paper actually describe
> the behavior of inserting two identical keys at different times when
> the table is set to show two versions?  If these keys were in two
> separate map files/sstables then something would have to make a
> decision to suppress one of them.  I am not sure the big table paper
> got that specific.  You could suppress one of the keys, or just
> consider them to be two versions.  We have been considering them to be
> versions.