Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Sending cachefiles back from streaming interface

Copy link to this message
Sending cachefiles back from streaming interface
Looks like I can use -cacheFile or DistributedCache.addCacheFile() to send read-only files (on HDFS) to mappers and reducers.  This is particularly useful for streaming mappers and reducers.

Question: in the streaming case, can a similar mechanism be used to send data back, or is stdout the only option?  I would like to send an HDFS file refrence to my streaming native code, have the code process it, produce a new file, and send *that* reference back as the emitted key/value for the reducer instead of serializing the file over stdout.  These are binary files for one thing and while I realize streams have evolved to accept binary IO, I am curious about the file-ref-passing approach as well.


Keith Wiley     [EMAIL PROTECTED]     keithwiley.com    music.keithwiley.com

"And what if we picked the wrong religion?  Every week, we're just making God
madder and madder!"
                                           --  Homer Simpson