Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Managing stdout in streaming


Copy link to this message
-
Managing stdout in streaming
So streaming uses stdout to organize the mapper/reducer output, one record per line with each key/val split at the first TAB.

(Presumably multiple TABS are permitted and become embedded in the value string, I haven't experimented with this yet).

Obviously, one must be very careful not to write any debugging or logging output to stdout.  It seems fairly straight-forward to simply use stderr instead, such that all associated output appears in the job tracker logs.

Buuuuut, what if I'm using a third-party library and I can't tell it to send output elsewhere?  I know that it is possible to redirect stdout using tricks like freopen(), but I believe it can be quite tricky to redirect stdout back to its original stream.  So if I directed stdout away from the original stream for processing, I'm not sure how I would latch it back onto the stream for the purpose of generating my mapper/reducer output data (in the Hadoop streaming TAB-delimited line-per-record format).

Any thoughts on this?  The cluster is running Linux incidentally.  I realize details like that become important when one starts fiddling with redirecting streams and such.

Thank you.

________________________________________________________________________________
Keith Wiley               [EMAIL PROTECTED]               www.keithwiley.com

"What I primarily learned in grad school is how much I *don't* know.
Consequently, I left grad school with a higher ignorance to knowledge ratio than
when I entered."
  -- Keith Wiley
________________________________________________________________________________
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB