Good call. We can't use the conventional web-based JT due to corporate access issues, but I looked at the job_XXX.xml file directly, and sure enough, it set mapred.output.compress to true. Now I just need to remember how that occurs. I simply ran the wordcount example straight off the command line, I didn't specify any overridden conf settings for the job.
Ultimately, the solution (or part of it) is to get away from .19 to a more up-to-date version of Hadoop. I would prefer 2.0 over 1.0 in fact, but due to a remarkable lack of concise EC2/Hadoop documentation (and the fact that what docs I did find were very old and therefore conformed to .19 style Hadoop), I have fallen back on old versions of Hadoop for my initial tests. In the long run, I will need to get a more modern version of Hadoop to successfully deploy on EC2.
On Feb 14, 2013, at 15:02 , Harsh J wrote:
> Did the job.xml of the job that produced this output also carry
> mapred.output.compress=false in it? The file should be viewable on the
> JT UI page for the job. Unless explicitly turned out, even 0.19
> wouldn't have enabled compression on its own.
Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com
"The easy confidence with which I know another man's religion is folly teaches
me to suspect that my own is also."
-- Mark Twain