Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> deletion technique question


+
Marc Reichman 2013-05-10, 16:39
+
Christopher 2013-05-10, 19:19
+
Marc Reichman 2013-05-10, 20:41
+
Keith Turner 2013-05-13, 15:05
Copy link to this message
-
Re: deletion technique question
The 1.5 solution looks nice.

Aware of the potential data loss angle and the sort ordering is also an
interesting angle, thank you.

In my particular case where I may not necessarily be aware of all
permutations of column visibility of a given key but want to replace them
all with a particular new visibility with the same data, how would I go
about that? Is there a way to use a batchscanner (step 1 of the
batchdeleter approach) to pull down all the permutations, then putdeletes
for them and put what I want?

In my case I'm pulling one copy of the data down first to verify I have it
at the user's current scan auth, then using the #1 approach to clear it out
and then put it in again as the vis I need.
On Mon, May 13, 2013 at 10:05 AM, Keith Turner <[EMAIL PROTECTED]> wrote:

>
>
>
> On Fri, May 10, 2013 at 12:39 PM, Marc Reichman <
> [EMAIL PROTECTED]> wrote:
>
>> I have a table with rows which have 3 column values in one column family,
>> and a column visibility.
>>
>> There are situations where I will want to replace the row content with a
>> new column visibility; I understand that the visibility attributes are
>> immutable, so I will have to delete and re-put.
>>
>> Am I better off doing:
>> 1. BatchDeleter with authorizations to allow access, set range to the key
>> in question, call delete, and then put in mutations with the new visibility
>> 2. Create mutations with a putDelete followed by a put with the new
>> visibility for each value
>> 3. Something else entirely?
>>
>
> In 1.5, you can use ACCUMULO-956
>
>
>>
>> For option #2, can I simply do a putDelete on the column
>> family/qualifier? Or do I need to "know" the old authorizations to put in a
>> visibility expression with the putDelete?
>>
>> For all of these, can a client get up-to-the-minute results immediately
>> after? Or does some kind of compaction need to occur first?
>>
>
> If you send a mutation with a delete and put, the client will be able to
> see it after the batchwriter flushes or closes.  No compaction needed.
>
> I am little fuzzy on #1.  Will you delete everything in one pass (using
> batchdeleter), and then do another pass writing data w/ updated colvis?  If
> so this would seems to imply that you are pulling the data from another
> source (other than the table stuff was deleted from)?
>
> Make sure the method you chose is not susceptible to data loss in the
> event that the client dies.  For example if a client was, reading a table
> and then writing a delete and updates mutation for each key/val read.  If
> the client died and some deletes were written, but not the corresponding
> updates, then that data would not be seen to be transformed on the second
> run.
>
> When you change the colvis, you change the sort order.  If you read a key
> and K and change it to K', where K' sorts after K. If you insert K', its
> possible that you may read it.  Its being inserted in front of the scanners
> pointer.  Because of buffering in the batch writer and scanner, this would
> not occur always, but it would occur occasionally.  Something to be aware
> of.
>
>
>
>
+
Keith Turner 2013-05-13, 16:23