Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Advanced HDFS operations from Python embedded scripts


Copy link to this message
-
Advanced HDFS operations from Python embedded scripts
Jakub Glapa 2013-01-17, 22:11
Hi,
my pig script is going to produce a set of files that will be an input for
a different process. The script would be running periodically so the number
of files would be growing.
I would like to implement an expiry mechanism were I could remove files
that are older than x or the number of files has reached some threshold.

I know a crazy way were in bash script you can call "hadoop fs -ls ...",
parse the output and then execute "rmr" on matching entries.

Is there a "human" way to do this from under python script? Pig.fs()
doesn't come in handy because it doesn't return anything to the script but
maybe I'm missing something?
How could I approach that differently other than writing a java program or
using shell? Python looks like a great idea but seems a bit limited at
least in version 0.10.1.

I appreciate any help!

--
regards,
Jakub Glapa