Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> deletion technique question


+
Marc Reichman 2013-05-10, 16:39
+
Christopher 2013-05-10, 19:19
+
Marc Reichman 2013-05-10, 20:41
+
Keith Turner 2013-05-13, 15:05
+
Marc Reichman 2013-05-13, 15:24
Copy link to this message
-
Re: deletion technique question
On Mon, May 13, 2013 at 11:24 AM, Marc Reichman <
[EMAIL PROTECTED]> wrote:

> The 1.5 solution looks nice.
>
> Aware of the potential data loss angle and the sort ordering is also an
> interesting angle, thank you.
>
> In my particular case where I may not necessarily be aware of all
> permutations of column visibility of a given key but want to replace them
> all with a particular new visibility with the same data, how would I go
> about that? Is there a way to use a batchscanner (step 1 of the
> batchdeleter approach) to pull down all the permutations, then putdeletes
> for them and put what I want?
>

No.  Its like you said.  You will only see entries based on the auths you
give the scanner.   There is no way to turn off colvis checking in a scan.
 Using the transforming iterator, from ACCUMULO-956, at compaction time is
a nice option because all data passes through iterators at compaction time.
>
> In my case I'm pulling one copy of the data down first to verify I have it
> at the user's current scan auth, then using the #1 approach to clear it out
> and then put it in again as the vis I need.
>

This is a good way to do it.  Could possibly clone the table instead of
pulling a copy down.
>
>
> On Mon, May 13, 2013 at 10:05 AM, Keith Turner <[EMAIL PROTECTED]> wrote:
>
>>
>>
>>
>> On Fri, May 10, 2013 at 12:39 PM, Marc Reichman <
>> [EMAIL PROTECTED]> wrote:
>>
>>> I have a table with rows which have 3 column values in one column
>>> family, and a column visibility.
>>>
>>> There are situations where I will want to replace the row content with a
>>> new column visibility; I understand that the visibility attributes are
>>> immutable, so I will have to delete and re-put.
>>>
>>> Am I better off doing:
>>> 1. BatchDeleter with authorizations to allow access, set range to the
>>> key in question, call delete, and then put in mutations with the new
>>> visibility
>>> 2. Create mutations with a putDelete followed by a put with the new
>>> visibility for each value
>>> 3. Something else entirely?
>>>
>>
>> In 1.5, you can use ACCUMULO-956
>>
>>
>>>
>>> For option #2, can I simply do a putDelete on the column
>>> family/qualifier? Or do I need to "know" the old authorizations to put in a
>>> visibility expression with the putDelete?
>>>
>>> For all of these, can a client get up-to-the-minute results immediately
>>> after? Or does some kind of compaction need to occur first?
>>>
>>
>> If you send a mutation with a delete and put, the client will be able to
>> see it after the batchwriter flushes or closes.  No compaction needed.
>>
>> I am little fuzzy on #1.  Will you delete everything in one pass (using
>> batchdeleter), and then do another pass writing data w/ updated colvis?  If
>> so this would seems to imply that you are pulling the data from another
>> source (other than the table stuff was deleted from)?
>>
>> Make sure the method you chose is not susceptible to data loss in the
>> event that the client dies.  For example if a client was, reading a table
>> and then writing a delete and updates mutation for each key/val read.  If
>> the client died and some deletes were written, but not the corresponding
>> updates, then that data would not be seen to be transformed on the second
>> run.
>>
>> When you change the colvis, you change the sort order.  If you read a key
>> and K and change it to K', where K' sorts after K. If you insert K', its
>> possible that you may read it.  Its being inserted in front of the scanners
>> pointer.  Because of buffering in the batch writer and scanner, this would
>> not occur always, but it would occur occasionally.  Something to be aware
>> of.
>>
>>
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB