-Direct HDFS access from a streaming job
Keith Wiley 2011-03-24, 05:26
contains this passage:
How do I process files, one per map?
As an example, consider the problem of zipping (compressing) a set of files across the hadoop cluster. You can achieve this using either of these methods:
• Hadoop Streaming and custom mapper script:
• Generate a file containing the full HDFS path of the input files. Each map task would get one file name as input.
• Create a mapper script which, given a filename, will get the file to local disk, gzip the file and put it back in the desired output directory
I'm not trying to gzip files as in the example, but I would like to read files directly from HDFS into C++ streaming code, as opposed to passing those files as input through the streaming input interface (stdin).
I'm not sure how to reference HDFS from C++ though. I mean, how would one open an ifstream to such a file?
Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com
"Luminous beings are we, not this crude matter."