Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> map/reduce of compressed Avro

Copy link to this message
Re: map/reduce of compressed Avro
Out of curiosity, are there any other file formats that provide splittable
gzip compression like Avro object containers?  I can only think of
Sequence Files.

On 4/29/13 3:47 PM, "Scott Carey" <[EMAIL PROTECTED]> wrote:

>Martin said it already, but I will emphasize:
>Avro data files are splittable and can support multiple mappers no matter
>what codec is used for compression.  This is because avro files are block
>based, and only use the compression within the block.  I recommend
>starting with gzip compression, and moving to snappy only if deflate
>compression level '1' is not fast enough.
>For more information on avro data files, see:
>On 4/22/13 11:47 PM, "nir_zamir" <[EMAIL PROTECTED]> wrote:
>>Thanks Martin.
>>What will happen if I try to use an indexed LZO-compressed avro file?
>>it work and utilize the index to allow multiple mappers?
>>I think that for Snappy for example, the file is splittable and can use
>>multiple mappers, but I haven't tested it yet - would be glad if anyone
>>any experience with that.
>>View this message in context:
>>Sent from the Avro - Users mailing list archive at Nabble.com.