|
|
-
Anonymous record schemas in data files
Eric Sammer 2013-03-04, 07:50
All:
I'm looking for some clarity on the use of anonymous records in Avro data files. Is this considered legal? 1.7.3 allows one to write a data file with DataFileWriter with an anonymous record schema that can't be read back which is not the nicest behavior. Here's a contrived example of a data file:
esammer:~/ esammer$ ~/bin/avro-tool getmeta 1362381940987-1 Exception in thread "main" org.apache.avro.SchemaParseException: No name in schema: {"type":"record","fields":[{"name":"word","type":"string"}]} at org.apache.avro.Schema.getRequiredText(Schema.java:1198) at org.apache.avro.Schema.parse(Schema.java:1066) at org.apache.avro.Schema$Parser.parse(Schema.java:927) at org.apache.avro.Schema$Parser.parse(Schema.java:917) at org.apache.avro.Schema.parse(Schema.java:974) at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:124) at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97) at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:89) at org.apache.avro.tool.DataFileGetMetaTool.run(DataFileGetMetaTool.java:63) at org.apache.avro.tool.Main.run(Main.java:78) at org.apache.avro.tool.Main.main(Main.java:67)
Before I filed the bug I wanted to clarify that anonymous records are against the spec (or that they aren't, and the bug is the schema parser).
Thanks. -- Eric Sammer twitter: esammer data: www.cloudera.com
-
Re: Anonymous record schemas in data files
Francis Galiegue 2013-03-04, 09:34
On Mon, Mar 4, 2013 at 8:50 AM, Eric Sammer <[EMAIL PROTECTED]> wrote: > All: > > I'm looking for some clarity on the use of anonymous records in Avro data > files. Is this considered legal? 1.7.3 allows one to write a data file with > DataFileWriter with an anonymous record schema that can't be read back which > is not the nicest behavior. Here's a contrived example of a data file: > > esammer:~/ esammer$ ~/bin/avro-tool getmeta 1362381940987-1 > Exception in thread "main" org.apache.avro.SchemaParseException: No name in > schema: {"type":"record","fields":[{"name":"word","type":"string"}]} Records must have a name. http://avro.apache.org/docs/current/spec.html#schema_record says so. -- Francis Galiegue, [EMAIL PROTECTED] JSON Schema in Java: http://json-schema-validator.herokuapp.com
-
Re: Anonymous record schemas in data files
Doug Cutting 2013-03-04, 17:31
As Francis noted, anonymous records are not permitted. That said, the runtime uses anonymous record schemas internally to implement message parameter lists (which are written and read like records, but don't have names).
How did you manage to create a file containing an anonymous record? Perhaps the API lets you create anonymous record schemas? If so, we should probably fix that, so they're only created by the Protocol parser via a package-private API.
Doug
On Sun, Mar 3, 2013 at 11:50 PM, Eric Sammer <[EMAIL PROTECTED]> wrote: > All: > > I'm looking for some clarity on the use of anonymous records in Avro data > files. Is this considered legal? 1.7.3 allows one to write a data file with > DataFileWriter with an anonymous record schema that can't be read back which > is not the nicest behavior. Here's a contrived example of a data file: > > esammer:~/ esammer$ ~/bin/avro-tool getmeta 1362381940987-1 > Exception in thread "main" org.apache.avro.SchemaParseException: No name in > schema: {"type":"record","fields":[{"name":"word","type":"string"}]} > at org.apache.avro.Schema.getRequiredText(Schema.java:1198) > at org.apache.avro.Schema.parse(Schema.java:1066) > at org.apache.avro.Schema$Parser.parse(Schema.java:927) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:974) > at > org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:124) > at > org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97) > at > org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:89) > at > org.apache.avro.tool.DataFileGetMetaTool.run(DataFileGetMetaTool.java:63) > at org.apache.avro.tool.Main.run(Main.java:78) > at org.apache.avro.tool.Main.main(Main.java:67) > > Before I filed the bug I wanted to clarify that anonymous records are > against the spec (or that they aren't, and the bug is the schema parser). > > Thanks. > -- > Eric Sammer > twitter: esammer > data: www.cloudera.com
-
Re: Anonymous record schemas in data files
Eric Sammer 2013-03-04, 17:57
Freaky. The following works just fine.
scala> val anonSchema = Schema.createRecord(Lists.newArrayList(new Field("foo", Schema.create(Type.STRING), null, null))) anonSchema: org.apache.avro.Schema {"type":"record","fields":[{"name":"foo","type":"string"}]}
scala> val writer = new DataFileWriter[Record](new GenericDatumWriter[Record](anonSchema)) writer: org.apache.avro.file.DataFileWriter[org.apache.avro.generic.GenericData.Record] = org.apache.avro.file.DataFileWriter@417f6125
scala> writer.create(anonSchema, new File("test-anon.avro")) res0: org.apache.avro.file.DataFileWriter[org.apache.avro.generic.GenericData.Record] = org.apache.avro.file.DataFileWriter@417f6125 scala> writer.append(new GenericRecordBuilder(anonSchema).set("foo", "bar").build())
scala> writer.flush()
scala> writer.close()
Of course, test-anon.avro can't be read back in any meaningful way, which is the problem. I'll file a JIRA. The question is, if Schema allows such a case, the semantic validation needs to exist in many places. I've been whining about the awkwardness of the Schema APIs (to Doug, at the office) for some time now. Maybe it's time we provided a set of builders that ensure semantic validity upon construction. I wouldn't mind putting in the work.
On Mon, Mar 4, 2013 at 9:31 AM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> As Francis noted, anonymous records are not permitted. That said, the > runtime uses anonymous record schemas internally to implement message > parameter lists (which are written and read like records, but don't > have names). > > How did you manage to create a file containing an anonymous record? > Perhaps the API lets you create anonymous record schemas? If so, we > should probably fix that, so they're only created by the Protocol > parser via a package-private API. > > Doug > > On Sun, Mar 3, 2013 at 11:50 PM, Eric Sammer <[EMAIL PROTECTED]> wrote: > > All: > > > > I'm looking for some clarity on the use of anonymous records in Avro data > > files. Is this considered legal? 1.7.3 allows one to write a data file > with > > DataFileWriter with an anonymous record schema that can't be read back > which > > is not the nicest behavior. Here's a contrived example of a data file: > > > > esammer:~/ esammer$ ~/bin/avro-tool getmeta 1362381940987-1 > > Exception in thread "main" org.apache.avro.SchemaParseException: No name > in > > schema: {"type":"record","fields":[{"name":"word","type":"string"}]} > > at org.apache.avro.Schema.getRequiredText(Schema.java:1198) > > at org.apache.avro.Schema.parse(Schema.java:1066) > > at org.apache.avro.Schema$Parser.parse(Schema.java:927) > > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > > at org.apache.avro.Schema.parse(Schema.java:974) > > at > > org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:124) > > at > > org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97) > > at > > org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:89) > > at > > org.apache.avro.tool.DataFileGetMetaTool.run(DataFileGetMetaTool.java:63) > > at org.apache.avro.tool.Main.run(Main.java:78) > > at org.apache.avro.tool.Main.main(Main.java:67) > > > > Before I filed the bug I wanted to clarify that anonymous records are > > against the spec (or that they aren't, and the bug is the schema parser). > > > > Thanks. > > -- > > Eric Sammer > > twitter: esammer > > data: www.cloudera.com >
-- Eric Sammer twitter: esammer data: www.cloudera.com
-
Re: Anonymous record schemas in data files
Doug Cutting 2013-03-04, 18:16
Schema.createRecord(List<Field>) should not be used except when creating protocol parameter list schemas. We should deprecate it in the next point release and make it package-private in the following release. There are a few calls in other packages that use this, but these could be replaced with calls to a new Protocol#createMessageParameters method.
Doug On Mon, Mar 4, 2013 at 9:57 AM, Eric Sammer <[EMAIL PROTECTED]> wrote: > Freaky. The following works just fine. > > scala> val anonSchema = Schema.createRecord(Lists.newArrayList(new > Field("foo", Schema.create(Type.STRING), null, null))) > anonSchema: org.apache.avro.Schema > {"type":"record","fields":[{"name":"foo","type":"string"}]} > > scala> val writer = new DataFileWriter[Record](new > GenericDatumWriter[Record](anonSchema)) > writer: > org.apache.avro.file.DataFileWriter[org.apache.avro.generic.GenericData.Record] > = org.apache.avro.file.DataFileWriter@417f6125 > > scala> writer.create(anonSchema, new File("test-anon.avro")) > res0: > org.apache.avro.file.DataFileWriter[org.apache.avro.generic.GenericData.Record] > = org.apache.avro.file.DataFileWriter@417f6125 > scala> writer.append(new GenericRecordBuilder(anonSchema).set("foo", > "bar").build()) > > scala> writer.flush() > > scala> writer.close() > > Of course, test-anon.avro can't be read back in any meaningful way, which is > the problem. I'll file a JIRA. The question is, if Schema allows such a > case, the semantic validation needs to exist in many places. I've been > whining about the awkwardness of the Schema APIs (to Doug, at the office) > for some time now. Maybe it's time we provided a set of builders that ensure > semantic validity upon construction. I wouldn't mind putting in the work. > > > > On Mon, Mar 4, 2013 at 9:31 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: >> >> As Francis noted, anonymous records are not permitted. That said, the >> runtime uses anonymous record schemas internally to implement message >> parameter lists (which are written and read like records, but don't >> have names). >> >> How did you manage to create a file containing an anonymous record? >> Perhaps the API lets you create anonymous record schemas? If so, we >> should probably fix that, so they're only created by the Protocol >> parser via a package-private API. >> >> Doug >> >> On Sun, Mar 3, 2013 at 11:50 PM, Eric Sammer <[EMAIL PROTECTED]> wrote: >> > All: >> > >> > I'm looking for some clarity on the use of anonymous records in Avro >> > data >> > files. Is this considered legal? 1.7.3 allows one to write a data file >> > with >> > DataFileWriter with an anonymous record schema that can't be read back >> > which >> > is not the nicest behavior. Here's a contrived example of a data file: >> > >> > esammer:~/ esammer$ ~/bin/avro-tool getmeta 1362381940987-1 >> > Exception in thread "main" org.apache.avro.SchemaParseException: No name >> > in >> > schema: {"type":"record","fields":[{"name":"word","type":"string"}]} >> > at org.apache.avro.Schema.getRequiredText(Schema.java:1198) >> > at org.apache.avro.Schema.parse(Schema.java:1066) >> > at org.apache.avro.Schema$Parser.parse(Schema.java:927) >> > at org.apache.avro.Schema$Parser.parse(Schema.java:917) >> > at org.apache.avro.Schema.parse(Schema.java:974) >> > at >> > org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:124) >> > at >> > org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97) >> > at >> > org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:89) >> > at >> > >> > org.apache.avro.tool.DataFileGetMetaTool.run(DataFileGetMetaTool.java:63) >> > at org.apache.avro.tool.Main.run(Main.java:78) >> > at org.apache.avro.tool.Main.main(Main.java:67) >> > >> > Before I filed the bug I wanted to clarify that anonymous records are >> > against the spec (or that they aren't, and the bug is the schema >> > parser). >> > >> > Thanks. >> >
|
|