Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> map/reduce of compressed Avro


Copy link to this message
-
Re: map/reduce of compressed Avro
Out of curiosity, are there any other file formats that provide splittable
gzip compression like Avro object containers?  I can only think of
Sequence Files.

On 4/29/13 3:47 PM, "Scott Carey" <[EMAIL PROTECTED]> wrote:

>Martin said it already, but I will emphasize:
>
>Avro data files are splittable and can support multiple mappers no matter
>what codec is used for compression.  This is because avro files are block
>based, and only use the compression within the block.  I recommend
>starting with gzip compression, and moving to snappy only if deflate
>compression level '1' is not fast enough.
>
>For more information on avro data files, see:
>http://avro.apache.org/docs/current/spec.html#Object+Container+Files
>
>
>
>On 4/22/13 11:47 PM, "nir_zamir" <[EMAIL PROTECTED]> wrote:
>
>>Thanks Martin.
>>
>>What will happen if I try to use an indexed LZO-compressed avro file?
>>Will
>>it work and utilize the index to allow multiple mappers?
>>
>>I think that for Snappy for example, the file is splittable and can use
>>multiple mappers, but I haven't tested it yet - would be glad if anyone
>>has
>>any experience with that.
>>
>>Thanks!
>>Nir.
>>
>>
>>
>>--
>>View this message in context:
>>http://apache-avro.679487.n3.nabble.com/map-reduce-of-compressed-Avro-tp4
>>0
>>26947p4027009.html
>>Sent from the Avro - Users mailing list archive at Nabble.com.
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB