Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Creating files through the hadoop streaming interface


Copy link to this message
-
Creating files through the hadoop streaming interface
Hi hadoop users,

I am trying to use the streaming interface to use my python script mapper
to create some files but am running into difficulties actually creating
files on the hdfs.

I have a python script mapper with no reducers.  Currently, it doesn't even
read the input and instead reads in the env variable for the output dir
(outdir = os.environ['mapred_output_dir']) and attempts to create an empty
file at that location.  However, that appears to fail with the [vague]
error message appended to this email.

I am using the streaming interface because the python file examples seem so
much cleaner and abstract a lot of the details away for me but if I instead
need to use the java bindings (and create a mapper and reducer class) then
please let me know.  I'm still learning hadoop.  As I understand it, I
should be able to create files in hadoop but perhaps there is limited
ability while using the streaming i/o interface.

Further questions: If my mapper absolutely must send my output to stdout,
is there a way to rename the file after it has been created?

Please help.

Thanks,
-Julian

Python mapper code:
outdir = os.environ['mapred_output_dir']
f = open(outdir + "/testfile.txt", "wb")
f.close()
13/02/06 17:07:55 INFO streaming.StreamJob:  map 100%  reduce 100%
13/02/06 17:07:55 INFO streaming.StreamJob: To kill this job, run:
13/02/06 17:07:55 INFO streaming.StreamJob:
/opt/hadoop/libexec/../bin/hadoop job
 -Dmapred.job.tracker=gcn-13-88.ibnet0:54311 -kill job_201302061706_0001
13/02/06 17:07:55 INFO streaming.StreamJob: Tracking URL:
http://gcn-13-88.ibnet0:50030/jobdetails.jsp?jobid=job_201302061706_0001
13/02/06 17:07:55 ERROR streaming.StreamJob: Job not successful. Error: #
of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
task_201302061706_0001_m_000000
13/02/06 17:07:55 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
+
Harsh J 2013-02-07, 05:18
+
Simone Leo 2013-02-07, 15:39
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB