Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> TextInputFormat and Gzip encoding - wordcount displaying binary data


Copy link to this message
-
TextInputFormat and Gzip encoding - wordcount displaying binary data
Hello,

It's frustrating to be dealing with these simple problems (and I know
the fault is mine, i'm missing something).
I'm running word count (from 0.20-2) on a gzip file (very small), the
output has binary characters.
When I run the same on the ungzipped file, the output is correct ascii.

I'm using the native gzip library. The command is

 hadoop jar /usr/lib/hadoop-0.20/hadoop-examples-0.20.2-CDH3B4.jar
wordcount /user/sguha/tmp/o.zip /user/sguha/tmp/o.wc.zip

(zip is gzip)

Any ideas?

Thanks
SG
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB