Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - map/reduce of compressed Avro


Copy link to this message
-
Re: map/reduce of compressed Avro
Enns, Steven 2013-04-29, 22:53
Out of curiosity, are there any other file formats that provide splittable
gzip compression like Avro object containers?  I can only think of
Sequence Files.

On 4/29/13 3:47 PM, "Scott Carey" <[EMAIL PROTECTED]> wrote:

>Martin said it already, but I will emphasize:
>
>Avro data files are splittable and can support multiple mappers no matter
>what codec is used for compression.  This is because avro files are block
>based, and only use the compression within the block.  I recommend
>starting with gzip compression, and moving to snappy only if deflate
>compression level '1' is not fast enough.
>
>For more information on avro data files, see:
>http://avro.apache.org/docs/current/spec.html#Object+Container+Files
>
>
>
>On 4/22/13 11:47 PM, "nir_zamir" <[EMAIL PROTECTED]> wrote:
>
>>Thanks Martin.
>>
>>What will happen if I try to use an indexed LZO-compressed avro file?
>>Will
>>it work and utilize the index to allow multiple mappers?
>>
>>I think that for Snappy for example, the file is splittable and can use
>>multiple mappers, but I haven't tested it yet - would be glad if anyone
>>has
>>any experience with that.
>>
>>Thanks!
>>Nir.
>>
>>
>>
>>--
>>View this message in context:
>>http://apache-avro.679487.n3.nabble.com/map-reduce-of-compressed-Avro-tp4
>>0
>>26947p4027009.html
>>Sent from the Avro - Users mailing list archive at Nabble.com.
>
>