Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: Using hadoop streaming with binary data


Copy link to this message
-
Re: Using hadoop streaming with binary data
I was able to write a little code to make this happen, and submitted a
patch to Hadoop:

https://issues.apache.org/jira/browse/MAPREDUCE-5018

There is a jar file and shell script there for anybody who wants to try
this without recompiling all of Hadoop.  It lets you run something like
"mapstream indir md5sum outdir" and get one map job per file in indir with
real raw binary data passed to your map command and the output written to a
file in outdir.  This makes it easy to run all your favorite Unix commands
as map-only streaming jobs, taking advantage of reliable distributed
execution.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB