Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> TextInputFormat and Gzip encoding - wordcount displaying binary data


+
Saptarshi Guha 2011-03-21, 22:47
+
Niels Basjes 2011-03-21, 23:01
Copy link to this message
-
Re: TextInputFormat and Gzip encoding - wordcount displaying binary data
True, my naming is
Hmm, now i know.
thanks

On Mon, Mar 21, 2011 at 4:01 PM, Niels Basjes <[EMAIL PROTECTED]> wrote:
> Hi,
>
> 2011/3/21 Saptarshi Guha <[EMAIL PROTECTED]>:
>> It's frustrating to be dealing with these simple problems (and I know
>> the fault is mine, i'm missing something).
>> I'm running word count (from 0.20-2) on a gzip file (very small), the
>> output has binary characters.
>> When I run the same on the ungzipped file, the output is correct ascii.
>>
>> I'm using the native gzip library. The command is
>>
>>  hadoop jar /usr/lib/hadoop-0.20/hadoop-examples-0.20.2-CDH3B4.jar
>> wordcount /user/sguha/tmp/o.zip /user/sguha/tmp/o.wc.zip
>>
>> (zip is gzip)
>
> No, .zip is "pkzip" and .gz is gzip.
>
> The applicable hadoop code actually chooses the decompressor on the
> extention of the filename.
>
> --
> Niels Basjes
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB