Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Generating snappy compressed avro files as hadoop map reduce input files


Copy link to this message
-
Re: Generating snappy compressed avro files as hadoop map reduce input files
I am not sure to understand the relation between your problem and the way
the temporary data are stored after the map phase.

However, I guess you are looking for a DataFileWriter and its setCodec
function.
http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#setCodec%28org.apache.avro.file.CodecFactory%29

Regards

Bertrand

PS : A snappy-compressed avro file is not a standard file which has been
compressed afterwards but really a specific file containing compressed
blocks. This principle is similar to the SequenceFile's. Maybe that's what
you mean by different snappy codec?

On Sun, Oct 13, 2013 at 5:16 PM, David Ginzburg <[EMAIL PROTECTED]>wrote:

>  Hi,
>
> I am writing an application that produces avro record files , to be stored
> on AWS S3 as possible input to EMR.
> I would like to compress with snappy codec before storing them on S3.
> It is my understanding that hadoop currently uses a different snappy
> codec, mostly used as intermediate map output format .
> My question is how can I generate within my application logic (not MR)
> snappy compressed avro files?
>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB