Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Creating files through the hadoop streaming interface


Copy link to this message
-
Creating files through the hadoop streaming interface
Hi hadoop users,

I am trying to use the streaming interface to use my python script mapper
to create some files but am running into difficulties actually creating
files on the hdfs.

I have a python script mapper with no reducers.  Currently, it doesn't even
read the input and instead reads in the env variable for the output dir
(outdir = os.environ['mapred_output_dir']) and attempts to create an empty
file at that location.  However, that appears to fail with the [vague]
error message appended to this email.

I am using the streaming interface because the python file examples seem so
much cleaner and abstract a lot of the details away for me but if I instead
need to use the java bindings (and create a mapper and reducer class) then
please let me know.  I'm still learning hadoop.  As I understand it, I
should be able to create files in hadoop but perhaps there is limited
ability while using the streaming i/o interface.

Further questions: If my mapper absolutely must send my output to stdout,
is there a way to rename the file after it has been created?

Please help.

Thanks,
-Julian

Python mapper code:
outdir = os.environ['mapred_output_dir']
f = open(outdir + "/testfile.txt", "wb")
f.close()
13/02/06 17:07:55 INFO streaming.StreamJob:  map 100%  reduce 100%
13/02/06 17:07:55 INFO streaming.StreamJob: To kill this job, run:
13/02/06 17:07:55 INFO streaming.StreamJob:
/opt/hadoop/libexec/../bin/hadoop job
 -Dmapred.job.tracker=gcn-13-88.ibnet0:54311 -kill job_201302061706_0001
13/02/06 17:07:55 INFO streaming.StreamJob: Tracking URL:
http://gcn-13-88.ibnet0:50030/jobdetails.jsp?jobid=job_201302061706_0001
13/02/06 17:07:55 ERROR streaming.StreamJob: Job not successful. Error: #
of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
task_201302061706_0001_m_000000
13/02/06 17:07:55 INFO streaming.StreamJob: killJob...
Streaming Command Failed!