Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - TextInputFormat and Gzip encoding - wordcount displaying binary data


Copy link to this message
-
TextInputFormat and Gzip encoding - wordcount displaying binary data
Saptarshi Guha 2011-03-21, 22:47
Hello,

It's frustrating to be dealing with these simple problems (and I know
the fault is mine, i'm missing something).
I'm running word count (from 0.20-2) on a gzip file (very small), the
output has binary characters.
When I run the same on the ungzipped file, the output is correct ascii.

I'm using the native gzip library. The command is

 hadoop jar /usr/lib/hadoop-0.20/hadoop-examples-0.20.2-CDH3B4.jar
wordcount /user/sguha/tmp/o.zip /user/sguha/tmp/o.wc.zip

(zip is gzip)

Any ideas?

Thanks
SG
+
Niels Basjes 2011-03-21, 23:01
+
Saptarshi Guha 2011-03-21, 23:10