Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> deletion technique question


Copy link to this message
-
Re: deletion technique question
On Mon, May 13, 2013 at 11:24 AM, Marc Reichman <
[EMAIL PROTECTED]> wrote:

> The 1.5 solution looks nice.
>
> Aware of the potential data loss angle and the sort ordering is also an
> interesting angle, thank you.
>
> In my particular case where I may not necessarily be aware of all
> permutations of column visibility of a given key but want to replace them
> all with a particular new visibility with the same data, how would I go
> about that? Is there a way to use a batchscanner (step 1 of the
> batchdeleter approach) to pull down all the permutations, then putdeletes
> for them and put what I want?
>

No.  Its like you said.  You will only see entries based on the auths you
give the scanner.   There is no way to turn off colvis checking in a scan.
 Using the transforming iterator, from ACCUMULO-956, at compaction time is
a nice option because all data passes through iterators at compaction time.
>
> In my case I'm pulling one copy of the data down first to verify I have it
> at the user's current scan auth, then using the #1 approach to clear it out
> and then put it in again as the vis I need.
>

This is a good way to do it.  Could possibly clone the table instead of
pulling a copy down.
>
>
> On Mon, May 13, 2013 at 10:05 AM, Keith Turner <[EMAIL PROTECTED]> wrote:
>
>>
>>
>>
>> On Fri, May 10, 2013 at 12:39 PM, Marc Reichman <
>> [EMAIL PROTECTED]> wrote:
>>
>>> I have a table with rows which have 3 column values in one column
>>> family, and a column visibility.
>>>
>>> There are situations where I will want to replace the row content with a
>>> new column visibility; I understand that the visibility attributes are
>>> immutable, so I will have to delete and re-put.
>>>
>>> Am I better off doing:
>>> 1. BatchDeleter with authorizations to allow access, set range to the
>>> key in question, call delete, and then put in mutations with the new
>>> visibility
>>> 2. Create mutations with a putDelete followed by a put with the new
>>> visibility for each value
>>> 3. Something else entirely?
>>>
>>
>> In 1.5, you can use ACCUMULO-956
>>
>>
>>>
>>> For option #2, can I simply do a putDelete on the column
>>> family/qualifier? Or do I need to "know" the old authorizations to put in a
>>> visibility expression with the putDelete?
>>>
>>> For all of these, can a client get up-to-the-minute results immediately
>>> after? Or does some kind of compaction need to occur first?
>>>
>>
>> If you send a mutation with a delete and put, the client will be able to
>> see it after the batchwriter flushes or closes.  No compaction needed.
>>
>> I am little fuzzy on #1.  Will you delete everything in one pass (using
>> batchdeleter), and then do another pass writing data w/ updated colvis?  If
>> so this would seems to imply that you are pulling the data from another
>> source (other than the table stuff was deleted from)?
>>
>> Make sure the method you chose is not susceptible to data loss in the
>> event that the client dies.  For example if a client was, reading a table
>> and then writing a delete and updates mutation for each key/val read.  If
>> the client died and some deletes were written, but not the corresponding
>> updates, then that data would not be seen to be transformed on the second
>> run.
>>
>> When you change the colvis, you change the sort order.  If you read a key
>> and K and change it to K', where K' sorts after K. If you insert K', its
>> possible that you may read it.  Its being inserted in front of the scanners
>> pointer.  Because of buffering in the batch writer and scanner, this would
>> not occur always, but it would occur occasionally.  Something to be aware
>> of.
>>
>>
>>
>>
>