Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> Efficiently wiping out random data?

Copy link to this message
Efficiently wiping out random data?
Hey devs,

I was presenting at GOTO Amsterdam yesterday and I got a question
about a scenario that I've never thought about before. I'm wondering
what others think.

How do you efficiently wipe out random data in HBase?

For example, you have a website and a user asks you to close their
account and get rid of the data.

Would you say "sure can do, lemme just issue a couple of Deletes!" and
call it a day? What if you really have to delete the data, not just
mask it, because of contractual obligations or local laws?

Major compacting is the obvious solution but it seems really
inefficient. Let's say you've got some truly random data to delete and
it happens so that you have at least one row per region to get rid
of... then you need to basically rewrite the whole table?

My answer was such, and I told the attendee that it's not an easy use
case to manage in HBase.


Kevin Odell 2013-06-19, 12:39
Jean-Daniel Cryans 2013-06-19, 12:46
Kevin Odell 2013-06-19, 12:48
Jesse Yates 2013-06-19, 15:12
Todd Lipcon 2013-06-19, 16:27
Ian Varley 2013-06-19, 18:28
Matt Corgan 2013-06-19, 21:15
lars hofhansl 2013-06-20, 09:35
Jean-Marc Spaggiari 2013-06-20, 12:39
Ian Varley 2013-06-23, 18:53
Andrew Purtell 2013-06-23, 22:32
Andrew Purtell 2013-06-23, 22:31
Jean-Daniel Cryans 2013-06-24, 17:58