Yes it would work with streaming, but note that if your os.system(…)
call produces any stdout prints, they are treated as task output and
are sent to HDFS/Reducers.
P.s. I assume the example you've produced is naive but if it is not,
re-consider appending all those strings together. You don't want to be
holding so much data in memory when run over large files, and nor
would a command support lengths as long as, say, a 64 MB input block.
On Fri, Aug 16, 2013 at 4:53 AM, jamal sasha <[EMAIL PROTECTED]> wrote:
> Lets say that I have a data which interacts with a rest api like
> %curl hostname data
> Now, I have the following script:
> #!/usr/bin/env python
> import sys,os
> cmd = """curl http://localhost --data '"""
> string = " "
> for line in sys.stdin:
> line = line.rstrip(os.linesep)
> string += line
> os.system(cmd + string+"'")
> Now, if i give a sample file for data, and run the above script with
> cat data.txt | python mapper.py
> It works perfectly. But will this work if i execute on hadoop as well?
> I am trying to set up hadoop on local mode to check it out but I think it
> will take me sometime to get there?
> Any experiences, suggestions?