Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Hadoop stream gzipped file with AvroAsTextInputFormat


Copy link to this message
-
Re: Hadoop stream gzipped file with AvroAsTextInputFormat
I think I've figured out how to make this work.

Initially I have a file "data.avro". I gzip it as "data.avro.gz" and try
to feed it to Hadoop. This does not work.

Instead Avro supports "deflate" codec natively. So I transcode it into
"data_deflate.avro" and feed it to hadoop and it works correctly. The
file size is slight larger than if I gzip it as a whole.

I was using avro-tools to do the transcoding. It's command line handling
is irregular. It takes me many trial and error to get it to work. The
command that works for me is

   java -jar avro-tools-1.7.6.jar recodec --codec=deflate input.avro
output.avro

Wai Yip