Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> RE: Using hadoop streaming with binary data


+
Venkatesh Kavuluri 2013-02-06, 21:38
Copy link to this message
-
Using hadoop streaming with binary data
Is it possible to pass unmolested binary data through a map-only streaming
job from the command line?  I.e., is there a way to avoid extra tabs and
newlines in the output?  I don't need input splits or key/value pairs, I
just want one whole input file fed unmodified into a program, and its
output written unmodified to HDFS.  For example, I'd like to run:

    hadoop jar hadoop-streaming.jar -mapper cat -numReduceTasks 0 -input in
-output out

and have 'out' be exactly the same as 'in'.

There does not seem to be a way to set
mapreduce.output.textoutputformat.separator to the empty string, and
typedbytes prepends the size.  Is there a way to leave data alone out of
the box, or will I have to write a custom InputFormat and OutputFormat?

Thanks!
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB