Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: executing linux command from hadoop (python)


Copy link to this message
-
Re: executing linux command from hadoop (python)
Yes it would work with streaming, but note that if your os.system(…)
call produces any stdout prints, they are treated as task output and
are sent to HDFS/Reducers.

P.s. I assume the example you've produced is naive but if it is not,
re-consider appending all those strings together. You don't want to be
holding so much data in memory when run over large files, and nor
would a command support lengths as long as, say, a 64 MB input block.

On Fri, Aug 16, 2013 at 4:53 AM, jamal sasha <[EMAIL PROTECTED]> wrote:
> Hi,
>  Lets say that I have a data which interacts with a rest api like
>
> %curl hostname data
>
> Now, I have the following script:
>
> #!/usr/bin/env python
> import sys,os
>
> cmd = """curl http://localhost  --data  '"""
> string = " "
> for line in sys.stdin:
>     line = line.rstrip(os.linesep)
>     string += line
>
> os.system(cmd + string+"'")
>
>
> Now, if i give a sample file for data, and run the above script with
>
> cat data.txt | python mapper.py
>
> It works perfectly. But will this work if i execute on hadoop as well?
> I am trying to set up hadoop on local mode to check it out but I think it
> will take me sometime to get there?
> Any experiences, suggestions?
> Thanks
>

--
Harsh J