Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # user - Re: executing hadoop commands from python?


Copy link to this message
-
Re: executing hadoop commands from python?
Harsh J 2013-02-17, 14:09
Instead of 'scraping' this way, consider using a library such as
Pydoop (http://pydoop.sourceforge.net) which provides pythonic ways
and APIs to interact with Hadoop components. There are also other
libraries covered at
http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/
for example.

On Sun, Feb 17, 2013 at 4:17 AM, jamal sasha <[EMAIL PROTECTED]> wrote:
> Hi,
>
>   This might be more of a python centric question but was wondering if
> anyone has tried it out...
>
> I am trying to run few hadoop commands from python program...
>
> For example if from command line, you do:
>
>       bin/hadoop dfs -ls /hdfs/query/path
>
> it returns all the files in the hdfs query path..
> So very similar to unix
>
>
> Now I am trying to basically do this from python.. and do some manipulation
> from it.
>
>      exec_str = "path/to/hadoop/bin/hadoop dfs -ls " + query_path
>      os.system(exec_str)
>
> Now, I am trying to grab this output to do some manipulation in it.
> For example.. count number of files?
> I looked into subprocess module but then... these are not native shell
> commands. hence not sure whether i can apply those concepts
> How to solve this?
>
> Thanks
>
>

--
Harsh J
+
anuj maurice 2013-02-17, 13:42