Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Advanced HDFS operations from Python embedded scripts


Copy link to this message
-
Advanced HDFS operations from Python embedded scripts
Hi,
my pig script is going to produce a set of files that will be an input for
a different process. The script would be running periodically so the number
of files would be growing.
I would like to implement an expiry mechanism were I could remove files
that are older than x or the number of files has reached some threshold.

I know a crazy way were in bash script you can call "hadoop fs -ls ...",
parse the output and then execute "rmr" on matching entries.

Is there a "human" way to do this from under python script? Pig.fs()
doesn't come in handy because it doesn't return anything to the script but
maybe I'm missing something?
How could I approach that differently other than writing a java program or
using shell? Python looks like a great idea but seems a bit limited at
least in version 0.10.1.

I appreciate any help!

--
regards,
Jakub Glapa
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB