Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> Efficiently wiping out random data?


+
Jean-Daniel Cryans 2013-06-19, 12:31
+
Kevin Odell 2013-06-19, 12:39
Copy link to this message
-
Re: Efficiently wiping out random data?
That sounds like a very effective way for developers to kill clusters
with compactions :)

J-D

On Wed, Jun 19, 2013 at 2:39 PM, Kevin O'dell <[EMAIL PROTECTED]> wrote:
> JD,
>
>    What about adding a flag for the delete, something like -full or
> -true(it is early).  Once we issue the delete to the proper row/region we
> run a flush, then execute a single region major compaction.  That way, if
> it is a single record, or a subset of data the impact is minimal.  If the
> delete happens to hit every region we will compact every region(not ideal).
>  Another thought would be an overwrite, but with versions this logic
> becomes more complicated.
>
>
> On Wed, Jun 19, 2013 at 8:31 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:
>
>> Hey devs,
>>
>> I was presenting at GOTO Amsterdam yesterday and I got a question
>> about a scenario that I've never thought about before. I'm wondering
>> what others think.
>>
>> How do you efficiently wipe out random data in HBase?
>>
>> For example, you have a website and a user asks you to close their
>> account and get rid of the data.
>>
>> Would you say "sure can do, lemme just issue a couple of Deletes!" and
>> call it a day? What if you really have to delete the data, not just
>> mask it, because of contractual obligations or local laws?
>>
>> Major compacting is the obvious solution but it seems really
>> inefficient. Let's say you've got some truly random data to delete and
>> it happens so that you have at least one row per region to get rid
>> of... then you need to basically rewrite the whole table?
>>
>> My answer was such, and I told the attendee that it's not an easy use
>> case to manage in HBase.
>>
>> Thoughts?
>>
>> J-D
>>
>
>
>
> --
> Kevin O'Dell
> Systems Engineer, Cloudera
+
Kevin Odell 2013-06-19, 12:48
+
Jesse Yates 2013-06-19, 15:12
+
Todd Lipcon 2013-06-19, 16:27
+
Ian Varley 2013-06-19, 18:28
+
Matt Corgan 2013-06-19, 21:15
+
lars hofhansl 2013-06-20, 09:35
+
Jean-Marc Spaggiari 2013-06-20, 12:39
+
Ian Varley 2013-06-23, 18:53
+
Andrew Purtell 2013-06-23, 22:32
+
Andrew Purtell 2013-06-23, 22:31
+
Jean-Daniel Cryans 2013-06-24, 17:58