Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> Efficiently wiping out random data?


+
Jean-Daniel Cryans 2013-06-19, 12:31
Copy link to this message
-
Re: Efficiently wiping out random data?
JD,

   What about adding a flag for the delete, something like -full or
-true(it is early).  Once we issue the delete to the proper row/region we
run a flush, then execute a single region major compaction.  That way, if
it is a single record, or a subset of data the impact is minimal.  If the
delete happens to hit every region we will compact every region(not ideal).
 Another thought would be an overwrite, but with versions this logic
becomes more complicated.
On Wed, Jun 19, 2013 at 8:31 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:

> Hey devs,
>
> I was presenting at GOTO Amsterdam yesterday and I got a question
> about a scenario that I've never thought about before. I'm wondering
> what others think.
>
> How do you efficiently wipe out random data in HBase?
>
> For example, you have a website and a user asks you to close their
> account and get rid of the data.
>
> Would you say "sure can do, lemme just issue a couple of Deletes!" and
> call it a day? What if you really have to delete the data, not just
> mask it, because of contractual obligations or local laws?
>
> Major compacting is the obvious solution but it seems really
> inefficient. Let's say you've got some truly random data to delete and
> it happens so that you have at least one row per region to get rid
> of... then you need to basically rewrite the whole table?
>
> My answer was such, and I told the attendee that it's not an easy use
> case to manage in HBase.
>
> Thoughts?
>
> J-D
>

--
Kevin O'Dell
Systems Engineer, Cloudera
+
Jean-Daniel Cryans 2013-06-19, 12:46
+
Kevin Odell 2013-06-19, 12:48
+
Jesse Yates 2013-06-19, 15:12
+
Todd Lipcon 2013-06-19, 16:27
+
Ian Varley 2013-06-19, 18:28
+
Matt Corgan 2013-06-19, 21:15
+
lars hofhansl 2013-06-20, 09:35
+
Jean-Marc Spaggiari 2013-06-20, 12:39
+
Ian Varley 2013-06-23, 18:53
+
Andrew Purtell 2013-06-23, 22:32
+
Andrew Purtell 2013-06-23, 22:31
+
Jean-Daniel Cryans 2013-06-24, 17:58