|
|
-
reader in hadoop without reader's schema
Koert Kuipers 2011-12-02, 03:32
I am reading from avro container files in hadoop. I know the container files have a (writers) schema stored in them. My reader specifies it's schema using avro.input.schema job parameter. This way any schema changes are gracefully handled with both schema's present.
However, i dont always need all this complexity. Is there a way to read without having to specify a reader's schema, where i basically say "just accept the writer's schema and read the data that way".
-
Re: reader in hadoop without reader's schema
Doug Cutting 2011-12-05, 23:50
On 12/01/2011 07:32 PM, Koert Kuipers wrote: > I am reading from avro container files in hadoop. I know the container > files have a (writers) schema stored in them. My reader specifies it's > schema using avro.input.schema job parameter. This way any schema > changes are gracefully handled with both schema's present. > > However, i dont always need all this complexity. Is there a way to read > without having to specify a reader's schema, where i basically say "just > accept the writer's schema and read the data that way".
That's what's done by default if you, e.g., do something like:
Iterable i = DataFileReader.openReader(file, new GenericDatumReader()); for (Object o : i) { System.out.println(o); }
Doug
-
Re: reader in hadoop without reader's schema
Koert Kuipers 2011-12-06, 15:16
What about if I use AvroInputFormat? I tried setting the input schema to null but that did not work On Dec 5, 2011 6:50 PM, "Doug Cutting" <[EMAIL PROTECTED]> wrote:
> On 12/01/2011 07:32 PM, Koert Kuipers wrote: > > I am reading from avro container files in hadoop. I know the container > > files have a (writers) schema stored in them. My reader specifies it's > > schema using avro.input.schema job parameter. This way any schema > > changes are gracefully handled with both schema's present. > > > > However, i dont always need all this complexity. Is there a way to read > > without having to specify a reader's schema, where i basically say "just > > accept the writer's schema and read the data that way". > > That's what's done by default if you, e.g., do something like: > > Iterable i = DataFileReader.openReader(file, new GenericDatumReader()); > for (Object o : i) { > System.out.println(o); > } > > Doug >
-
Re: reader in hadoop without reader's schema
Doug Cutting 2011-12-06, 18:18
On 12/06/2011 07:16 AM, Koert Kuipers wrote: > What about if I use AvroInputFormat? I tried setting the input schema to > null but that did not work
Yes, it looks like that would not currently work. Please file a Jira issue if you require this. It should be a simple modification to AvroRecordReader.java, plus adding a test for it.
Thanks,
Doug
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext