Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Efficiently wiping out random data?

Copy link to this message
Re: Efficiently wiping out random data?
Right, compaction followed by a 'secure' HDFS level delete by random
rewrites to nuke the blocks of the remnant. Even then it's difficult to say
something recoverable does not remain, though in practical terms the
hypothetical user here could be assured no API of HBase or HDFS could
ever retrieve the data.

Or burn the platters to ash.

On Sunday, June 23, 2013, Ian Varley wrote:

> One more followup on this, after talking to some security-types:
>  - The issue isn't wiping out all data for a customer; it's wiping out
> *specific* data. Using the "forget an encryption key" method would then
> mean separate encryption keys per row, which isn't feasible most of the
> time. (Consider information that becomes classified but didn't used to be,
> for example.)
>  - In some cases, decryption can still happen without keys, by brute force
> or from finding weaknesses in the algorithms down the road. Yes, I know
> that the brute force CPU time is measured in eons, but never say never; we
> can easily decrypt things now that were encrypted with the best available
> algorithms and keys 40 years ago. :)
> So for cases where it counts, a "secure delete" means no less than writing
> over the data with random strings. It would be interesting to add features
> to HBase / HDFS that passed muster for stuff like this; for example, an
> HDFS secure-delete<
> http://www.ghacks.net/2010/08/26/securely-delete-files-with-secure-delete/>
> command, and an HBase secure-delete that does all of: add delete marker,
> force major compaction, and run HDFS secure-delete.
> Ian
> On Jun 20, 2013, at 7:39 AM, Jean-Marc Spaggiari wrote:
> Correct, that's another way. Just need to have one encryption key per
> customer. And all what is written into HBase, over all the tables, is
> encrypted with that key.
> If the customer want to have all its data erased, just erased the key,
> and you have no way to retrieve anything from HBase even if it's still
> into all the tables. So now you can emit all the deletes required, and
> that will be totally deleted on the next regular major compaction...
> There will be a small impact on regular reads/write since you will
> need to read the key first, but them a user delete will be way more
> efficient.
> 2013/6/20 lars hofhansl <[EMAIL PROTECTED] <javascript:;><mailto:
> [EMAIL PROTECTED] <javascript:;>>>:
> IMHO the "proper" of doing such things is encryption.
> 0-ing the values or even overwriting with a pattern typically leaves
> traces of the old data on a magnetic platter that can be retrieved with
> proper forensics. (Secure erase of SSD is typically pretty secure, though).
> For such use cases, files (HFiles) should be encrypted and the decryption
> keys should just be forgotten at the appropriate times.
> I realize that for J-D's specific use case doing this at the HFile level
> would be very difficult.
> Maybe the KVs' values could be stored encrypted with a user specific key.
> Deleting the user's data then means to forget that users key.
> -- Lars
> ________________________________
> From: Matt Corgan <[EMAIL PROTECTED] <javascript:;><mailto:
> [EMAIL PROTECTED] <javascript:;>>>
> To: dev <[EMAIL PROTECTED] <javascript:;><mailto:[EMAIL PROTECTED]<javascript:;>
> >>
> Sent: Wednesday, June 19, 2013 2:15 PM
> Subject: Re: Efficiently wiping out random data?
> Would it be possible to zero-out all the value bytes for cells in existing
> HFiles?  They keys would remain, but if you knew that ahead of time you
> could design your keys so they don't contain important info.
> On Wed, Jun 19, 2013 at 11:28 AM, Ian Varley <[EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>> wrote:
> At least in some cases, the answer to that question ("do you even have to
> destroy your tapes?") is a resounding "yes". For some extreme cases (think
> health care, privacy, etc), companies do all RDBMS backups to disk instead
> of tape for that reason. (Transaction logs are considered different, I

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)