Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> map/reduce of compressed Avro


Copy link to this message
-
Re: map/reduce of compressed Avro
To my knowledge, LZO is not a supported codec for Avro data files. It's
possible that you have a LZO-compressed Hadoop sequence file containing
Avro records, but that would be a format you defined yourself, and not the
same as an Avro data file.

Avro data files are designed to be splittable regardless of the codec they
use, so you can have multiple mappers that each consume a portion of the
input file. The format achieves that by breaking the data into blocks, and
compressing each block separately; hence it can be split at block
boundaries.

Best,
Martin
On 22 April 2013 23:47, nir_zamir <[EMAIL PROTECTED]> wrote:

> Thanks Martin.
>
> What will happen if I try to use an indexed LZO-compressed avro file? Will
> it work and utilize the index to allow multiple mappers?
>
> I think that for Snappy for example, the file is splittable and can use
> multiple mappers, but I haven't tested it yet - would be glad if anyone has
> any experience with that.
>
> Thanks!
> Nir.
>
>
>
> --
> View this message in context:
> http://apache-avro.679487.n3.nabble.com/map-reduce-of-compressed-Avro-tp4026947p4027009.html
> Sent from the Avro - Users mailing list archive at Nabble.com.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB