I'm looking into writing a patch for HDFS which will provide a new method
within HDFS which can securely delete the contents of a block on all the
nodes upon which it exists. By securely delete I mean, overwrite with
1's/0's/random data cyclically such that the data could not be recovered
I'm not currently aware of any existing code / methods which provide this,
so was going to implement this myself.
I figured the DataNode.java was probably the place to start looking into
how this could be done, so I've read the source for this, but it's not
really enlightened me a massive amount.
I'm assuming I need to tell the NameServer that all DataNodes with a
particular block id would be required to be deleted, then as each DataNode
calls home, the DataNode would be instructed to securely delete the
relevant block, and it would oblige.
Unfortunately I have no idea where to begin and was looking for some
I guess specifically I'd like to know:
1. Where the hdfs CLI commands are implemented
2. How a DataNode identifies a block / how a NameServer could inform a
DataNode to delete a block
3. Where the existing "delete" is implemented so I can make sure my secure
delete makes use of it after successfully blanking the block contents
4. If I've got the right idea about this at all?
First Option Software Ltd
Tel: +44 (0)1962 738232
Mob: +44 (0)7710 160458
Fax: +44 (0)1962 600112
Web: www.b <http://www.fosolutions.co.uk/>espokesoftware.com<http://bespokesoftware.com/>
This is confidential, non-binding and not company endorsed - see full terms
First Option Software Ltd Registered No. 06340261
Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.
Andrew Wang 2013-08-15, 18:50
Todd Lipcon 2013-08-15, 21:17
Colin McCabe 2013-08-20, 19:42
Colin McCabe 2013-08-20, 19:43
Matt Fellows 2013-08-20, 22:14