Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Enabling compression


Copy link to this message
-
Re: Enabling compression
Harsh J 2013-04-09, 08:43
Hi Vinod,

In Avro, compression is provided only at the file container level
(i.e. block compression).

For compressing a simple byte array, you can rely on the Hadoop's
compression classes such as a GzipCodec [1] to compress the byte
stream directly (wrapping via a compressed output stream [2] got by
its helper method [3]).

Something like this, for example (I've not tested it out):

ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
GzipCodec codec = ReflectionUtils.newInstance(GzipCodec.class, new
Configuration());
OutputStream compressedOutputStream = codec.createOutputStream(outputStream);
[… Encode over compressedOutputStream, etc. …]

[1] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/GzipCodec.html
[2] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/CompressorStream.html
[3] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/GzipCodec.html#createOutputStream(java.io.OutputStream)

On Tue, Apr 9, 2013 at 11:17 AM, Vinod Jammula
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> I have a a csv string which I want to serialize, compress and write to a
> database.
>
> I have the following code to serialize the string
>
> ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
> Encoder e = EncoderFactory.get().binaryEncoder(outputStream, null);
> GenericDatumWriter w = new GenericDatumWriter(schema);
> w.write(record, e)
> byte[] avroBytes = outputStream.toByteArray();
>
>
> Following code to de-serialize and process the record.
>
> DatumReader<GenericRecord> reader = new
> GenericDatumReader<GenericRecord>(schema);
>
>  Decoder decoder = DecoderFactory.get().binaryDecoder(avroBytes, null);
>
> GenericRecord record = reader.read(decoder, null);
>
>
> I find compression with DataFileWriter and DataFileReader. But how to enable
> the compression for avro serialized buffer.
>
> Thanks and Regards,
> Vinod

--
Harsh J