Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> executing hadoop commands from python?


Copy link to this message
-
executing hadoop commands from python?
Hi,

  This might be more of a python centric question but was wondering if
anyone has tried it out...

I am trying to run few hadoop commands from python program...

For example if from command line, you do:

      bin/hadoop dfs -ls /hdfs/query/path

it returns all the files in the hdfs query path..
So very similar to unix
Now I am trying to basically do this from python.. and do some manipulation
from it.

     exec_str = "path/to/hadoop/bin/hadoop dfs -ls " + query_path
     os.system(exec_str)

Now, I am trying to grab this output to do some manipulation in it.
For example.. count number of files?
I looked into subprocess module but then... these are not native shell
commands. hence not sure whether i can apply those concepts
How to solve this?

Thanks
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB