Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - deletion technique question


Copy link to this message
-
Re: deletion technique question
Keith Turner 2013-05-13, 15:05
On Fri, May 10, 2013 at 12:39 PM, Marc Reichman <
[EMAIL PROTECTED]> wrote:

> I have a table with rows which have 3 column values in one column family,
> and a column visibility.
>
> There are situations where I will want to replace the row content with a
> new column visibility; I understand that the visibility attributes are
> immutable, so I will have to delete and re-put.
>
> Am I better off doing:
> 1. BatchDeleter with authorizations to allow access, set range to the key
> in question, call delete, and then put in mutations with the new visibility
> 2. Create mutations with a putDelete followed by a put with the new
> visibility for each value
> 3. Something else entirely?
>

In 1.5, you can use ACCUMULO-956
>
> For option #2, can I simply do a putDelete on the column family/qualifier?
> Or do I need to "know" the old authorizations to put in a visibility
> expression with the putDelete?
>
> For all of these, can a client get up-to-the-minute results immediately
> after? Or does some kind of compaction need to occur first?
>

If you send a mutation with a delete and put, the client will be able to
see it after the batchwriter flushes or closes.  No compaction needed.

I am little fuzzy on #1.  Will you delete everything in one pass (using
batchdeleter), and then do another pass writing data w/ updated colvis?  If
so this would seems to imply that you are pulling the data from another
source (other than the table stuff was deleted from)?

Make sure the method you chose is not susceptible to data loss in the event
that the client dies.  For example if a client was, reading a table and
then writing a delete and updates mutation for each key/val read.  If the
client died and some deletes were written, but not the corresponding
updates, then that data would not be seen to be transformed on the second
run.

When you change the colvis, you change the sort order.  If you read a key
and K and change it to K', where K' sorts after K. If you insert K', its
possible that you may read it.  Its being inserted in front of the scanners
pointer.  Because of buffering in the batch writer and scanner, this would
not occur always, but it would occur occasionally.  Something to be aware
of.