Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> zlib does not uncompress gzip during MR run


Copy link to this message
-
zlib does not uncompress gzip during MR run
Hi,

My input files are gzipped, and I am using the builtin java codecs
successfully to uncompress them in a normal java run...

        fileIn = fs.open(fsplit.getPath());
        codec = compressionCodecs.getCodec(fsplit.getPath());
        in = new LineReader(codec != null ? codec.createInputStream(fileIn)
: fileIn, config);

But when I use the same piece of code in a MR job I am getting...

12/10/23 11:02:25 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
12/10/23 11:02:25 INFO zlib.ZlibFactory: Successfully loaded & initialized
native-zlib library
12/10/23 11:02:25 INFO compress.CodecPool: Got brand-new compressor
12/10/23 11:02:26 INFO mapreduce.HFileOutputFormat: Incremental table
output configured.
12/10/23 11:02:26 INFO input.FileInputFormat: Total input paths to process
: 3
12/10/23 11:02:27 INFO mapred.JobClient: Running job: job_201210221549_0014
12/10/23 11:02:28 INFO mapred.JobClient:  map 0% reduce 0%
12/10/23 11:02:49 INFO mapred.JobClient: Task Id :
attempt_201210221549_0014_m_000003_0, Status : FAILED
java.io.IOException: incorrect header check
    at
org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native
Method)
    at
org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:221)
    at
org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)
    at
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)
    at java.io.InputStream.read(InputStream.java:101)

So I am thinking that there is some incompatibility of zlib and my gzip. Is
there a way to force hadoop to use the java built-in compression codecs?

Also, I would like to try lzo which I hope will allow splitting of the
input files (I recall reading this somewhere). Can someone point me to the
best way to do this?

Thanks,

Jon
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB