-Re: map/reduce of compressed Avro
Martin Kleppmann 2013-04-23, 10:38
To my knowledge, LZO is not a supported codec for Avro data files. It's
possible that you have a LZO-compressed Hadoop sequence file containing
Avro records, but that would be a format you defined yourself, and not the
same as an Avro data file.
Avro data files are designed to be splittable regardless of the codec they
use, so you can have multiple mappers that each consume a portion of the
input file. The format achieves that by breaking the data into blocks, and
compressing each block separately; hence it can be split at block
On 22 April 2013 23:47, nir_zamir <[EMAIL PROTECTED]> wrote:
> Thanks Martin.
> What will happen if I try to use an indexed LZO-compressed avro file? Will
> it work and utilize the index to allow multiple mappers?
> I think that for Snappy for example, the file is splittable and can use
> multiple mappers, but I haven't tested it yet - would be glad if anyone has
> any experience with that.
> View this message in context:
> Sent from the Avro - Users mailing list archive at Nabble.com.