Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Efficiently wiping out random data?

Copy link to this message
Re: Efficiently wiping out random data?
Thank you so much for all the answers guys, looks like I should write
up something for the ref guide!


On Sun, Jun 23, 2013 at 3:31 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote:
> Right, compaction followed by a 'secure' HDFS level delete by random
> rewrites to nuke the blocks of the remnant. Even then it's difficult to say
> something recoverable does not remain, though in practical terms the
> hypothetical user here could be assured no API of HBase or HDFS could
> ever retrieve the data.
> Or burn the platters to ash.
> On Sunday, June 23, 2013, Ian Varley wrote:
>> One more followup on this, after talking to some security-types:
>>  - The issue isn't wiping out all data for a customer; it's wiping out
>> *specific* data. Using the "forget an encryption key" method would then
>> mean separate encryption keys per row, which isn't feasible most of the
>> time. (Consider information that becomes classified but didn't used to be,
>> for example.)
>>  - In some cases, decryption can still happen without keys, by brute force
>> or from finding weaknesses in the algorithms down the road. Yes, I know
>> that the brute force CPU time is measured in eons, but never say never; we
>> can easily decrypt things now that were encrypted with the best available
>> algorithms and keys 40 years ago. :)
>> So for cases where it counts, a "secure delete" means no less than writing
>> over the data with random strings. It would be interesting to add features
>> to HBase / HDFS that passed muster for stuff like this; for example, an
>> HDFS secure-delete<
>> http://www.ghacks.net/2010/08/26/securely-delete-files-with-secure-delete/>
>> command, and an HBase secure-delete that does all of: add delete marker,
>> force major compaction, and run HDFS secure-delete.
>> Ian
>> On Jun 20, 2013, at 7:39 AM, Jean-Marc Spaggiari wrote:
>> Correct, that's another way. Just need to have one encryption key per
>> customer. And all what is written into HBase, over all the tables, is
>> encrypted with that key.
>> If the customer want to have all its data erased, just erased the key,
>> and you have no way to retrieve anything from HBase even if it's still
>> into all the tables. So now you can emit all the deletes required, and
>> that will be totally deleted on the next regular major compaction...
>> There will be a small impact on regular reads/write since you will
>> need to read the key first, but them a user delete will be way more
>> efficient.
>> 2013/6/20 lars hofhansl <[EMAIL PROTECTED] <javascript:;><mailto:
>> [EMAIL PROTECTED] <javascript:;>>>:
>> IMHO the "proper" of doing such things is encryption.
>> 0-ing the values or even overwriting with a pattern typically leaves
>> traces of the old data on a magnetic platter that can be retrieved with
>> proper forensics. (Secure erase of SSD is typically pretty secure, though).
>> For such use cases, files (HFiles) should be encrypted and the decryption
>> keys should just be forgotten at the appropriate times.
>> I realize that for J-D's specific use case doing this at the HFile level
>> would be very difficult.
>> Maybe the KVs' values could be stored encrypted with a user specific key.
>> Deleting the user's data then means to forget that users key.
>> -- Lars
>> ________________________________
>> From: Matt Corgan <[EMAIL PROTECTED] <javascript:;><mailto:
>> [EMAIL PROTECTED] <javascript:;>>>
>> To: dev <[EMAIL PROTECTED] <javascript:;><mailto:[EMAIL PROTECTED]<javascript:;>
>> >>
>> Sent: Wednesday, June 19, 2013 2:15 PM
>> Subject: Re: Efficiently wiping out random data?
>> Would it be possible to zero-out all the value bytes for cells in existing
>> HFiles?  They keys would remain, but if you knew that ahead of time you
>> could design your keys so they don't contain important info.
>> On Wed, Jun 19, 2013 at 11:28 AM, Ian Varley <[EMAIL PROTECTED]
>> <mailto:[EMAIL PROTECTED]>> wrote:
>> At least in some cases, the answer to that question ("do you even have to