-Advanced HDFS operations from Python embedded scripts
Jakub Glapa 2013-01-17, 22:11
my pig script is going to produce a set of files that will be an input for
a different process. The script would be running periodically so the number
of files would be growing.
I would like to implement an expiry mechanism were I could remove files
that are older than x or the number of files has reached some threshold.
I know a crazy way were in bash script you can call "hadoop fs -ls ...",
parse the output and then execute "rmr" on matching entries.
Is there a "human" way to do this from under python script? Pig.fs()
doesn't come in handy because it doesn't return anything to the script but
maybe I'm missing something?
How could I approach that differently other than writing a java program or
using shell? Python looks like a great idea but seems a bit limited at
least in version 0.10.1.
I appreciate any help!