|
|
-
RE: Using hadoop streaming with binary data
Venkatesh Kavuluri 2013-02-06, 21:38
You can use hadoop's DistCp to copy files via map/reduce.
Date: Wed, 6 Feb 2013 16:19:23 -0500 Subject: Using hadoop streaming with binary data From: [EMAIL PROTECTED] To: [EMAIL PROTECTED]
Is it possible to pass unmolested binary data through a map-only streaming job from the command line? I.e., is there a way to avoid extra tabs and newlines in the output? I don't need input splits or key/value pairs, I just want one whole input file fed unmodified into a program, and its output written unmodified to HDFS. For example, I'd like to run: hadoop jar hadoop-streaming.jar -mapper cat -numReduceTasks 0 -input in -output out
and have 'out' be exactly the same as 'in'.
There does not seem to be a way to set mapreduce.output.textoutputformat.separator to the empty string, and typedbytes prepends the size. Is there a way to leave data alone out of the box, or will I have to write a custom InputFormat and OutputFormat? Thanks!
+
Venkatesh Kavuluri 2013-02-06, 21:38
-
Using hadoop streaming with binary data
Jay Hacker 2013-02-07, 15:19
Is it possible to pass unmolested binary data through a map-only streaming job from the command line? I.e., is there a way to avoid extra tabs and newlines in the output? I don't need input splits or key/value pairs, I just want one whole input file fed unmodified into a program, and its output written unmodified to HDFS. For example, I'd like to run:
hadoop jar hadoop-streaming.jar -mapper cat -numReduceTasks 0 -input in -output out
and have 'out' be exactly the same as 'in'.
There does not seem to be a way to set mapreduce.output.textoutputformat.separator to the empty string, and typedbytes prepends the size. Is there a way to leave data alone out of the box, or will I have to write a custom InputFormat and OutputFormat?
Thanks!
+
Jay Hacker 2013-02-07, 15:19
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext