On Mar 16, 2011, at 10:35 AM, W.P. McNeill wrote:
> On HDFS, anyone can run hadoop fs -rmr /* and delete everything.
In addition to what everyone else has said, I'm fairly certain that -rmr / is specifically safeguarded against. But /* might have slipped through the cracks.
> What are examples of systems people working with high-value HDFS data put in
> place so that they can sleep at night?
I set in place crontabs where we randomly delete the entire file system to remind folks that HDFS is still immature.
OK, not really.
In reality, we have basically a policy that everyone signs off on before getting an account where they understand that Hadoop should not be considered 'primary storage', is not a data warehouse, is not backed up, and could disappear at any moment. But we also make sure that the base (ETL'd) data lives on multiple grids. Any other data should be reproducible from that base data.