Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Sending cachefiles back from streaming interface


Copy link to this message
-
Sending cachefiles back from streaming interface
Looks like I can use -cacheFile or DistributedCache.addCacheFile() to send read-only files (on HDFS) to mappers and reducers.  This is particularly useful for streaming mappers and reducers.

Question: in the streaming case, can a similar mechanism be used to send data back, or is stdout the only option?  I would like to send an HDFS file refrence to my streaming native code, have the code process it, produce a new file, and send *that* reference back as the emitted key/value for the reducer instead of serializing the file over stdout.  These are binary files for one thing and while I realize streams have evolved to accept binary IO, I am curious about the file-ref-passing approach as well.

Thanks.

________________________________________________________________________________
Keith Wiley     [EMAIL PROTECTED]     keithwiley.com    music.keithwiley.com

"And what if we picked the wrong religion?  Every week, we're just making God
madder and madder!"
                                           --  Homer Simpson
________________________________________________________________________________
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB