Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # dev >> Secure deletion of blocks


Copy link to this message
-
Re: Secure deletion of blocks
Hi Matt,

I'd also recommend implementing this in a somewhat pluggable way -- eg a
configuration for a Deleter class. The default Deleter can be the one we
use today which just removes the file, and you could plug in a
SecureDeleter. I'd also see some use cases for a Deleter implementation
which doesn't actually delete the block, but instead moves it to a local
trash directory which is deleted a day or two later. This sort of policy
can help recover data as a last ditch effort if there is some kind of
accidental deletion and there aren't snapshots in place.

-Todd

On Thu, Aug 15, 2013 at 11:50 AM, Andrew Wang <[EMAIL PROTECTED]>wrote:

> Hi Matt,
>
> Here are some code pointers:
>
> - When doing a file deletion, the NameNode turns the file into a set of
> blocks that need to be deleted.
> - When datanodes heartbeat in to the NN (see BPServiceActor#offerService),
> the NN replies with blocks to be invalidated (see BlockCommand and
> DatanodeProtocol.DNA_INVALIDATE).
> - The DN processes these invalidates in
> BPServiceActor#processCommandFromActive (look for DNA_INVALIDATE again).
> - The magic lines you're looking for are probably in
> FsDatasetAsyncDiskService#run, since we delete blocks in the background
>
> Best,
> Andrew
>
>
> On Thu, Aug 15, 2013 at 5:31 AM, Matt Fellows <
> [EMAIL PROTECTED]> wrote:
>
> > Hi,
> > I'm looking into writing a patch for HDFS which will provide a new method
> > within HDFS which can securely delete the contents of a block on all the
> > nodes upon which it exists. By securely delete I mean, overwrite with
> > 1's/0's/random data cyclically such that the data could not be recovered
> > forensically.
> >
> > I'm not currently aware of any existing code / methods which provide
> this,
> > so was going to implement this myself.
> >
> > I figured the DataNode.java was probably the place to start looking into
> > how this could be done, so I've read the source for this, but it's not
> > really enlightened me a massive amount.
> >
> > I'm assuming I need to tell the NameServer that all DataNodes with a
> > particular block id would be required to be deleted, then as each
> DataNode
> > calls home, the DataNode would be instructed to securely delete the
> > relevant block, and it would oblige.
> >
> > Unfortunately I have no idea where to begin and was looking for some
> > pointers?
> >
> > I guess specifically I'd like to know:
> >
> > 1. Where the hdfs CLI commands are implemented
> > 2. How a DataNode identifies a block / how a NameServer could inform a
> > DataNode to delete a block
> > 3. Where the existing "delete" is implemented so I can make sure my
> secure
> > delete makes use of it after successfully blanking the block contents
> > 4. If I've got the right idea about this at all?
> >
> > Kind regards,
> > Matt Fellows
> >
> > --
> > [image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94]
> >  First Option Software Ltd
> > Signal House
> > Jacklyns Lane
> > Alresford
> > SO24 9JJ
> > Tel: +44 (0)1962 738232
> > Mob: +44 (0)7710 160458
> > Fax: +44 (0)1962 600112
> > Web: www.b <http://www.fosolutions.co.uk/>espokesoftware.com<
> http://bespokesoftware.com/>
> >
> > ______________________________**______________________
> >
> > This is confidential, non-binding and not company endorsed - see full
> > terms at www.fosolutions.co.uk/**emailpolicy.html<
> http://www.fosolutions.co.uk/emailpolicy.html>
> >
> > First Option Software Ltd Registered No. 06340261
> > Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.
> > ______________________________**______________________
> >
> >
>

--
Todd Lipcon
Software Engineer, Cloudera
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB