Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> map/reduce of compressed Avro


Copy link to this message
-
Re: map/reduce of compressed Avro
Martin said it already, but I will emphasize:

Avro data files are splittable and can support multiple mappers no matter
what codec is used for compression.  This is because avro files are block
based, and only use the compression within the block.  I recommend
starting with gzip compression, and moving to snappy only if deflate
compression level '1' is not fast enough.

For more information on avro data files, see:
http://avro.apache.org/docs/current/spec.html#Object+Container+Files

On 4/22/13 11:47 PM, "nir_zamir" <[EMAIL PROTECTED]> wrote:

>Thanks Martin.
>
>What will happen if I try to use an indexed LZO-compressed avro file? Will
>it work and utilize the index to allow multiple mappers?
>
>I think that for Snappy for example, the file is splittable and can use
>multiple mappers, but I haven't tested it yet - would be glad if anyone
>has
>any experience with that.
>
>Thanks!
>Nir.
>
>
>
>--
>View this message in context:
>http://apache-avro.679487.n3.nabble.com/map-reduce-of-compressed-Avro-tp40
>26947p4027009.html
>Sent from the Avro - Users mailing list archive at Nabble.com.