Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - deletion technique question


Copy link to this message
-
Re: deletion technique question
Marc Reichman 2013-05-13, 15:24
The 1.5 solution looks nice.

Aware of the potential data loss angle and the sort ordering is also an
interesting angle, thank you.

In my particular case where I may not necessarily be aware of all
permutations of column visibility of a given key but want to replace them
all with a particular new visibility with the same data, how would I go
about that? Is there a way to use a batchscanner (step 1 of the
batchdeleter approach) to pull down all the permutations, then putdeletes
for them and put what I want?

In my case I'm pulling one copy of the data down first to verify I have it
at the user's current scan auth, then using the #1 approach to clear it out
and then put it in again as the vis I need.
On Mon, May 13, 2013 at 10:05 AM, Keith Turner <[EMAIL PROTECTED]> wrote:

>
>
>
> On Fri, May 10, 2013 at 12:39 PM, Marc Reichman <
> [EMAIL PROTECTED]> wrote:
>
>> I have a table with rows which have 3 column values in one column family,
>> and a column visibility.
>>
>> There are situations where I will want to replace the row content with a
>> new column visibility; I understand that the visibility attributes are
>> immutable, so I will have to delete and re-put.
>>
>> Am I better off doing:
>> 1. BatchDeleter with authorizations to allow access, set range to the key
>> in question, call delete, and then put in mutations with the new visibility
>> 2. Create mutations with a putDelete followed by a put with the new
>> visibility for each value
>> 3. Something else entirely?
>>
>
> In 1.5, you can use ACCUMULO-956
>
>
>>
>> For option #2, can I simply do a putDelete on the column
>> family/qualifier? Or do I need to "know" the old authorizations to put in a
>> visibility expression with the putDelete?
>>
>> For all of these, can a client get up-to-the-minute results immediately
>> after? Or does some kind of compaction need to occur first?
>>
>
> If you send a mutation with a delete and put, the client will be able to
> see it after the batchwriter flushes or closes.  No compaction needed.
>
> I am little fuzzy on #1.  Will you delete everything in one pass (using
> batchdeleter), and then do another pass writing data w/ updated colvis?  If
> so this would seems to imply that you are pulling the data from another
> source (other than the table stuff was deleted from)?
>
> Make sure the method you chose is not susceptible to data loss in the
> event that the client dies.  For example if a client was, reading a table
> and then writing a delete and updates mutation for each key/val read.  If
> the client died and some deletes were written, but not the corresponding
> updates, then that data would not be seen to be transformed on the second
> run.
>
> When you change the colvis, you change the sort order.  If you read a key
> and K and change it to K', where K' sorts after K. If you insert K', its
> possible that you may read it.  Its being inserted in front of the scanners
> pointer.  Because of buffering in the batch writer and scanner, this would
> not occur always, but it would occur occasionally.  Something to be aware
> of.
>
>
>
>