Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Advanced HDFS operations from Python embedded scripts


+
Jakub Glapa 2013-01-17, 22:11
+
Clément MATHIEU 2013-01-18, 09:12
Copy link to this message
-
Re: Advanced HDFS operations from Python embedded scripts
that looks promising, thanks Clement!

--
regards,
pozdrawiam,
Jakub Glapa
On Fri, Jan 18, 2013 at 9:12 AM, Clément MATHIEU <[EMAIL PROTECTED]>wrote:

> On 2013-01-17 23:11, Jakub Glapa wrote:
>
> Hi Jakub,
>
>
>  my pig script is going to produce a set of files that will be an input for
>> a different process. The script would be running periodically so the
>> number
>> of files would be growing.
>> I would like to implement an expiry mechanism were I could remove files
>> that are older than x or the number of files has reached some threshold.
>>
>> I know a crazy way were in bash script you can call "hadoop fs -ls ...",
>> parse the output and then execute "rmr" on matching entries.
>>
>> Is there a "human" way to do this from under python script? Pig.fs()
>>
>
> I had the same issue than you few months ago. The public Pig scripting API
> only exposes a FsShell object which is way too limited to do any real work.
> However it is possible to get access to the Hadoop FileSystem API from a
> Python script:
>
>
> def get_fs():
>     """Return a org.apache.hadoop.fs.**FileSystem instance."""
>     # Pig scripting API exports a FsShell but not a FileSystem object.
>     ctx   = ScriptPigContext.get()
>     props = ctx.getPigContext().**getProperties()
>     conf  = ConfigurationUtil.**toConfiguration(props)
>     fs    = FileSystem.get(conf)
>     return fs
>
>
> Once you have a FileSystem object you can do whatever you want using the
> standard Hadoop API.
>
>
> Hope this helps.
>
> -- Clément
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB