Re: Hadoop stream gzipped file with AvroAsTextInputFormat
I think I've figured out how to make this work.
Initially I have a file "data.avro". I gzip it as "data.avro.gz" and try
to feed it to Hadoop. This does not work.
Instead Avro supports "deflate" codec natively. So I transcode it into
"data_deflate.avro" and feed it to hadoop and it works correctly. The
file size is slight larger than if I gzip it as a whole.
I was using avro-tools to do the transcoding. It's command line handling
is irregular. It takes me many trial and error to get it to work. The
command that works for me is
java -jar avro-tools-1.7.6.jar recodec --codec=deflate input.avro