-Creating files through the hadoop streaming interface
Julian Bui 2013-02-07, 01:13
Hi hadoop users,
I am trying to use the streaming interface to use my python script mapper
to create some files but am running into difficulties actually creating
files on the hdfs.
I have a python script mapper with no reducers. Currently, it doesn't even
read the input and instead reads in the env variable for the output dir
(outdir = os.environ['mapred_output_dir']) and attempts to create an empty
file at that location. However, that appears to fail with the [vague]
error message appended to this email.
I am using the streaming interface because the python file examples seem so
much cleaner and abstract a lot of the details away for me but if I instead
need to use the java bindings (and create a mapper and reducer class) then
please let me know. I'm still learning hadoop. As I understand it, I
should be able to create files in hadoop but perhaps there is limited
ability while using the streaming i/o interface.
Further questions: If my mapper absolutely must send my output to stdout,
is there a way to rename the file after it has been created?
Python mapper code:
outdir = os.environ['mapred_output_dir']
f = open(outdir + "/testfile.txt", "wb")
13/02/06 17:07:55 INFO streaming.StreamJob: map 100% reduce 100%
13/02/06 17:07:55 INFO streaming.StreamJob: To kill this job, run:
13/02/06 17:07:55 INFO streaming.StreamJob:
-Dmapred.job.tracker=gcn-13-88.ibnet0:54311 -kill job_201302061706_0001
13/02/06 17:07:55 INFO streaming.StreamJob: Tracking URL:
13/02/06 17:07:55 ERROR streaming.StreamJob: Job not successful. Error: #
of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
13/02/06 17:07:55 INFO streaming.StreamJob: killJob...
Streaming Command Failed!