Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: executing hadoop commands from python?


Copy link to this message
-
Re: executing hadoop commands from python?
Instead of 'scraping' this way, consider using a library such as
Pydoop (http://pydoop.sourceforge.net) which provides pythonic ways
and APIs to interact with Hadoop components. There are also other
libraries covered at
http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/
for example.

On Sun, Feb 17, 2013 at 4:17 AM, jamal sasha <[EMAIL PROTECTED]> wrote:
> Hi,
>
>   This might be more of a python centric question but was wondering if
> anyone has tried it out...
>
> I am trying to run few hadoop commands from python program...
>
> For example if from command line, you do:
>
>       bin/hadoop dfs -ls /hdfs/query/path
>
> it returns all the files in the hdfs query path..
> So very similar to unix
>
>
> Now I am trying to basically do this from python.. and do some manipulation
> from it.
>
>      exec_str = "path/to/hadoop/bin/hadoop dfs -ls " + query_path
>      os.system(exec_str)
>
> Now, I am trying to grab this output to do some manipulation in it.
> For example.. count number of files?
> I looked into subprocess module but then... these are not native shell
> commands. hence not sure whether i can apply those concepts
> How to solve this?
>
> Thanks
>
>

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB