Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Re: executing linux command from hadoop (python)


Copy link to this message
-
Re: executing linux command from hadoop (python)
Harsh J 2013-08-16, 03:33
Yes it would work with streaming, but note that if your os.system(…)
call produces any stdout prints, they are treated as task output and
are sent to HDFS/Reducers.

P.s. I assume the example you've produced is naive but if it is not,
re-consider appending all those strings together. You don't want to be
holding so much data in memory when run over large files, and nor
would a command support lengths as long as, say, a 64 MB input block.

On Fri, Aug 16, 2013 at 4:53 AM, jamal sasha <[EMAIL PROTECTED]> wrote:
> Hi,
>  Lets say that I have a data which interacts with a rest api like
>
> %curl hostname data
>
> Now, I have the following script:
>
> #!/usr/bin/env python
> import sys,os
>
> cmd = """curl http://localhost  --data  '"""
> string = " "
> for line in sys.stdin:
>     line = line.rstrip(os.linesep)
>     string += line
>
> os.system(cmd + string+"'")
>
>
> Now, if i give a sample file for data, and run the above script with
>
> cat data.txt | python mapper.py
>
> It works perfectly. But will this work if i execute on hadoop as well?
> I am trying to set up hadoop on local mode to check it out but I think it
> will take me sometime to get there?
> Any experiences, suggestions?
> Thanks
>

--
Harsh J