Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: Using hadoop streaming with binary data

Copy link to this message
Re: Using hadoop streaming with binary data
I was able to write a little code to make this happen, and submitted a
patch to Hadoop:


There is a jar file and shell script there for anybody who wants to try
this without recompiling all of Hadoop.  It lets you run something like
"mapstream indir md5sum outdir" and get one map job per file in indir with
real raw binary data passed to your map command and the output written to a
file in outdir.  This makes it easy to run all your favorite Unix commands
as map-only streaming jobs, taking advantage of reliable distributed