Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Efficiently wiping out random data?


Copy link to this message
-
Efficiently wiping out random data?
Hey devs,

I was presenting at GOTO Amsterdam yesterday and I got a question
about a scenario that I've never thought about before. I'm wondering
what others think.

How do you efficiently wipe out random data in HBase?

For example, you have a website and a user asks you to close their
account and get rid of the data.

Would you say "sure can do, lemme just issue a couple of Deletes!" and
call it a day? What if you really have to delete the data, not just
mask it, because of contractual obligations or local laws?

Major compacting is the obvious solution but it seems really
inefficient. Let's say you've got some truly random data to delete and
it happens so that you have at least one row per region to get rid
of... then you need to basically rewrite the whole table?

My answer was such, and I told the attendee that it's not an easy use
case to manage in HBase.

Thoughts?

J-D