Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Re: [jira] [Commented] (ACCUMULO-227) Improve in memory map counts to provide cell level uniqueness for repeated columns in mutation


Copy link to this message
-
Re: [jira] [Commented] (ACCUMULO-227) Improve in memory map counts to provide cell level uniqueness for repeated columns in mutation
And just to be clear, since there are several definitions of key flying around - in the following case:

row1,colfam1,colqual1,4 -> valueA
row1,colfam1,colqual1,5 -> valueB

These can coexist peacefully - although the versioning iterator might supress all but k versions.

in this case:

row1,colfam1,colqual1,4 -> valueA
row1,colfam1,colqual1,4 -> valueB

Accumulo should throw one away arbitrarily. I think what you mentioned, a system iterator that performs this logic, would be a good implementation.

On Dec 22, 2011, at 5:09 PM, Keith Turner wrote:

> On Thu, Dec 22, 2011 at 4:49 PM, Aaron Cordova <[EMAIL PROTECTED]> wrote:
>> I think it's fine to consider different versions of 'identical keys', meaning row,colfam,colqual, because in that case the implementation still treats two keys that only differ by timestamp as two unique keys. But I don't think we should allow multiple identical _versions_ of identical keys, to use your terminology. I think we should throw all but one away if the user does happen to try to insert them and if the user wants to aggregate across values, he or she must use different version numbers or timestamps or whatever.
>>
>> If generating unique timestamps within mutations that want to perform several updates to the same row,colfam,colqual is a problem, why don't we allow the user to 'put()' multiple updates into a mutation, and on the server then assign slightly different timestamps to the identical row,colfam,colqual triples that are found in a mutation. Would that make everyone happy?
>
> This still does not address the issue of separate mutations inserting
> the exact same key.  Also timestamps are only set on the keys in a
> mutation if the user does not set them.
>
> So if a table comes to have multiple keys that are exactly the same,
> what do you propose?  That we drop them?  Which one will you drop?
> One nice thing about Accumulo is that if you wish to have this
> behavior, you can very easily write an iterator to do it.  I think you
> are proposing that we configure an iterator to do this by default?
>
> I think if the user is inserting things with exact same key and
> expecting it to behave like a treemap (honor order of arrival), then
> it never will.  Even if we drop duplicate keys, we will not achieve
> the map behavior you described.