Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Enabling compression


+
Vinod Jammula 2013-04-09, 05:47
Copy link to this message
-
Re: Enabling compression
Hi Vinod,

In Avro, compression is provided only at the file container level
(i.e. block compression).

For compressing a simple byte array, you can rely on the Hadoop's
compression classes such as a GzipCodec [1] to compress the byte
stream directly (wrapping via a compressed output stream [2] got by
its helper method [3]).

Something like this, for example (I've not tested it out):

ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
GzipCodec codec = ReflectionUtils.newInstance(GzipCodec.class, new
Configuration());
OutputStream compressedOutputStream = codec.createOutputStream(outputStream);
[… Encode over compressedOutputStream, etc. …]

[1] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/GzipCodec.html
[2] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/CompressorStream.html
[3] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/GzipCodec.html#createOutputStream(java.io.OutputStream)

On Tue, Apr 9, 2013 at 11:17 AM, Vinod Jammula
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> I have a a csv string which I want to serialize, compress and write to a
> database.
>
> I have the following code to serialize the string
>
> ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
> Encoder e = EncoderFactory.get().binaryEncoder(outputStream, null);
> GenericDatumWriter w = new GenericDatumWriter(schema);
> w.write(record, e)
> byte[] avroBytes = outputStream.toByteArray();
>
>
> Following code to de-serialize and process the record.
>
> DatumReader<GenericRecord> reader = new
> GenericDatumReader<GenericRecord>(schema);
>
>  Decoder decoder = DecoderFactory.get().binaryDecoder(avroBytes, null);
>
> GenericRecord record = reader.read(decoder, null);
>
>
> I find compression with DataFileWriter and DataFileReader. But how to enable
> the compression for avro serialized buffer.
>
> Thanks and Regards,
> Vinod

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB