Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Using AVRO C with a large schema

Copy link to this message
Using AVRO C with a large schema
We have a C program that prepares many GB of data for analysis at a later time.  We'd like to serialize this data using AVRO C.  Here are some statements that I hope are wrong.

1. There's a 1:1 relationship between schema and file.  You can't mix different schemas in the same file.

2. Each value written to a file represents the file's full schema.  You can't write pieces of a schema.

3. AVRO C cannot write values that are bigger than the file writer's specified block_size.  I don't think there's enough memory to hold both the original structures and a gigantic block_size.

What's my best course of action?  Split the structures and arrays into as multiple files?