|
Tim Sell
2013-01-07, 19:56
Alan Gates
2013-01-07, 20:24
meghana narasimhan
2013-01-07, 21:32
Tim Sell
2013-01-08, 01:02
Tim Sell
2013-01-08, 01:03
Alan Gates
2013-01-08, 17:38
Dmitriy Ryaboy
2013-01-11, 03:35
Ruslan Al-Fakikh
2013-04-05, 00:51
|
-
JsonLoader schema field order shouldn't matterTim Sell 2013-01-07, 19:56
When using JsonLoader with Pig 0.10.0
if I have an input.json file that looks like this: {"date": "2007-08-25", "id": 16} {"date": "2007-09-08", "id": 17} {"date": "2007-09-15", "id": 18} And I use a = LOAD 'input.json' USING JsonLoader('id:int,date:chararray'); DUMP a; I get errors when it tries to force the date fields into an integer. Shouldn't this work independent of the ordering of the schema fields? Json writers generally don't make guarantees about the ordering. One alternative (though annoying) would to be use elephant bird instead, but I can't get that to compile against hadoop 2.0.0 and Pig 0.10.0. ~Tim
-
Re: JsonLoader schema field order shouldn't matterAlan Gates 2013-01-07, 20:24
Currently the JsonLoader does assume ordering of the fields. It does not do any name matching against the given schema to find the right field.
Alan. On Jan 7, 2013, at 11:56 AM, Tim Sell wrote: > When using JsonLoader with Pig 0.10.0 > > if I have an input.json file that looks like this: > > {"date": "2007-08-25", "id": 16} > {"date": "2007-09-08", "id": 17} > {"date": "2007-09-15", "id": 18} > > And I use > > a = LOAD 'input.json' USING JsonLoader('id:int,date:chararray'); > DUMP a; > > I get errors when it tries to force the date fields into an integer. > > Shouldn't this work independent of the ordering of the schema fields? > Json writers generally don't make guarantees about the ordering. > > One alternative (though annoying) would to be use elephant bird > instead, but I can't get that to compile against hadoop 2.0.0 and Pig > 0.10.0. > > ~Tim
-
Re: JsonLoader schema field order shouldn't mattermeghana narasimhan 2013-01-07, 21:32
Hi Tim,
We are using elephant-bird 3.0.2 with hadoop-2.0.0-mr1-cdh4.1.1 and pig-0.10.0-cdh4.1.1. We are using the jar available in the maven repo. Didnt have to build it out. - Meg On Mon, Jan 7, 2013 at 11:56 AM, Tim Sell <[EMAIL PROTECTED]> wrote: > When using JsonLoader with Pig 0.10.0 > > if I have an input.json file that looks like this: > > {"date": "2007-08-25", "id": 16} > {"date": "2007-09-08", "id": 17} > {"date": "2007-09-15", "id": 18} > > And I use > > a = LOAD 'input.json' USING JsonLoader('id:int,date:chararray'); > DUMP a; > > I get errors when it tries to force the date fields into an integer. > > Shouldn't this work independent of the ordering of the schema fields? > Json writers generally don't make guarantees about the ordering. > > One alternative (though annoying) would to be use elephant bird > instead, but I can't get that to compile against hadoop 2.0.0 and Pig > 0.10.0. > > ~Tim >
-
Re: JsonLoader schema field order shouldn't matterTim Sell 2013-01-08, 01:02
This seems like a bug to me. It makes it risky to work with JSON data
generated by something other than Pig since the ordering might change. What do you think? I didn't see a bug for it in Jira, so would this (still open) one be the place to mention it? Or should I make a new one? https://issues.apache.org/jira/browse/PIG-1914 ~T On 7 January 2013 20:24, Alan Gates <[EMAIL PROTECTED]> wrote: > Currently the JsonLoader does assume ordering of the fields. It does not do any name matching against the given schema to find the right field. > > Alan. > > On Jan 7, 2013, at 11:56 AM, Tim Sell wrote: > >> When using JsonLoader with Pig 0.10.0 >> >> if I have an input.json file that looks like this: >> >> {"date": "2007-08-25", "id": 16} >> {"date": "2007-09-08", "id": 17} >> {"date": "2007-09-15", "id": 18} >> >> And I use >> >> a = LOAD 'input.json' USING JsonLoader('id:int,date:chararray'); >> DUMP a; >> >> I get errors when it tries to force the date fields into an integer. >> >> Shouldn't this work independent of the ordering of the schema fields? >> Json writers generally don't make guarantees about the ordering. >> >> One alternative (though annoying) would to be use elephant bird >> instead, but I can't get that to compile against hadoop 2.0.0 and Pig >> 0.10.0. >> >> ~Tim >
-
Re: JsonLoader schema field order shouldn't matterTim Sell 2013-01-08, 01:03
Hmm,
I was using pretty much the same setup and got errors complaining about Counter being an interface when it expected a class. I'll try again with the jars straight out of maven tomorrow. Thanks. ~T On 7 January 2013 21:32, meghana narasimhan <[EMAIL PROTECTED]> wrote: > Hi Tim, > > We are using elephant-bird 3.0.2 with hadoop-2.0.0-mr1-cdh4.1.1 > and pig-0.10.0-cdh4.1.1. We are using the jar available in the maven repo. > Didnt have to build it out. > > - Meg > > > On Mon, Jan 7, 2013 at 11:56 AM, Tim Sell <[EMAIL PROTECTED]> wrote: > >> When using JsonLoader with Pig 0.10.0 >> >> if I have an input.json file that looks like this: >> >> {"date": "2007-08-25", "id": 16} >> {"date": "2007-09-08", "id": 17} >> {"date": "2007-09-15", "id": 18} >> >> And I use >> >> a = LOAD 'input.json' USING JsonLoader('id:int,date:chararray'); >> DUMP a; >> >> I get errors when it tries to force the date fields into an integer. >> >> Shouldn't this work independent of the ordering of the schema fields? >> Json writers generally don't make guarantees about the ordering. >> >> One alternative (though annoying) would to be use elephant bird >> instead, but I can't get that to compile against hadoop 2.0.0 and Pig >> 0.10.0. >> >> ~Tim >>
-
Re: JsonLoader schema field order shouldn't matterAlan Gates 2013-01-08, 17:38
I would open a new JIRA, since 1914 is focussed on building an alternative that discovers schema, while you are wanting to improve the existing one.
Alan. On Jan 7, 2013, at 5:02 PM, Tim Sell wrote: > This seems like a bug to me. It makes it risky to work with JSON data > generated by something other than Pig since the ordering might change. > What do you think? > > I didn't see a bug for it in Jira, so would this (still open) one be > the place to mention it? Or should I make a new one? > https://issues.apache.org/jira/browse/PIG-1914 > > ~T > > > On 7 January 2013 20:24, Alan Gates <[EMAIL PROTECTED]> wrote: >> Currently the JsonLoader does assume ordering of the fields. It does not do any name matching against the given schema to find the right field. >> >> Alan. >> >> On Jan 7, 2013, at 11:56 AM, Tim Sell wrote: >> >>> When using JsonLoader with Pig 0.10.0 >>> >>> if I have an input.json file that looks like this: >>> >>> {"date": "2007-08-25", "id": 16} >>> {"date": "2007-09-08", "id": 17} >>> {"date": "2007-09-15", "id": 18} >>> >>> And I use >>> >>> a = LOAD 'input.json' USING JsonLoader('id:int,date:chararray'); >>> DUMP a; >>> >>> I get errors when it tries to force the date fields into an integer. >>> >>> Shouldn't this work independent of the ordering of the schema fields? >>> Json writers generally don't make guarantees about the ordering. >>> >>> One alternative (though annoying) would to be use elephant bird >>> instead, but I can't get that to compile against hadoop 2.0.0 and Pig >>> 0.10.0. >>> >>> ~Tim >>
-
Re: JsonLoader schema field order shouldn't matterDmitriy Ryaboy 2013-01-11, 03:35
Tim, can you open a github issue with EB about compiling against 0.10?
I think this is an easy fix. On Tue, Jan 8, 2013 at 9:38 AM, Alan Gates <[EMAIL PROTECTED]> wrote: > I would open a new JIRA, since 1914 is focussed on building an alternative > that discovers schema, while you are wanting to improve the existing one. > > Alan. > > On Jan 7, 2013, at 5:02 PM, Tim Sell wrote: > > > This seems like a bug to me. It makes it risky to work with JSON data > > generated by something other than Pig since the ordering might change. > > What do you think? > > > > I didn't see a bug for it in Jira, so would this (still open) one be > > the place to mention it? Or should I make a new one? > > https://issues.apache.org/jira/browse/PIG-1914 > > > > ~T > > > > > > On 7 January 2013 20:24, Alan Gates <[EMAIL PROTECTED]> wrote: > >> Currently the JsonLoader does assume ordering of the fields. It does > not do any name matching against the given schema to find the right field. > >> > >> Alan. > >> > >> On Jan 7, 2013, at 11:56 AM, Tim Sell wrote: > >> > >>> When using JsonLoader with Pig 0.10.0 > >>> > >>> if I have an input.json file that looks like this: > >>> > >>> {"date": "2007-08-25", "id": 16} > >>> {"date": "2007-09-08", "id": 17} > >>> {"date": "2007-09-15", "id": 18} > >>> > >>> And I use > >>> > >>> a = LOAD 'input.json' USING JsonLoader('id:int,date:chararray'); > >>> DUMP a; > >>> > >>> I get errors when it tries to force the date fields into an integer. > >>> > >>> Shouldn't this work independent of the ordering of the schema fields? > >>> Json writers generally don't make guarantees about the ordering. > >>> > >>> One alternative (though annoying) would to be use elephant bird > >>> instead, but I can't get that to compile against hadoop 2.0.0 and Pig > >>> 0.10.0. > >>> > >>> ~Tim > >> > >
-
Re: JsonLoader schema field order shouldn't matterRuslan Al-Fakikh 2013-04-05, 00:51
Tim,
have you resolved the issue of using the elephant-bird with pig 0.10? meghana, I am using just the same configuration: pig -version Apache Pig version 0.10.0-cdh4.1.1 (rexported) hadoop version Hadoop 2.0.0-cdh4.1.1 and getting just the same error as Tim explained: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected Can you please give an example of your Pig script? I am running it with the following commands: REGISTER elephant-bird-pig-3.0.2.jar; inputData = LOAD 'sample_simple.json' USING com.twitter.elephantbird.pig.load.JsonLoader() as (json:map[]); DUMP inputData; Thanks in advance On Fri, Jan 11, 2013 at 7:35 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > Tim, can you open a github issue with EB about compiling against 0.10? > I think this is an easy fix. > > > On Tue, Jan 8, 2013 at 9:38 AM, Alan Gates <[EMAIL PROTECTED]> wrote: > > > I would open a new JIRA, since 1914 is focussed on building an > alternative > > that discovers schema, while you are wanting to improve the existing one. > > > > Alan. > > > > On Jan 7, 2013, at 5:02 PM, Tim Sell wrote: > > > > > This seems like a bug to me. It makes it risky to work with JSON data > > > generated by something other than Pig since the ordering might change. > > > What do you think? > > > > > > I didn't see a bug for it in Jira, so would this (still open) one be > > > the place to mention it? Or should I make a new one? > > > https://issues.apache.org/jira/browse/PIG-1914 > > > > > > ~T > > > > > > > > > On 7 January 2013 20:24, Alan Gates <[EMAIL PROTECTED]> wrote: > > >> Currently the JsonLoader does assume ordering of the fields. It does > > not do any name matching against the given schema to find the right > field. > > >> > > >> Alan. > > >> > > >> On Jan 7, 2013, at 11:56 AM, Tim Sell wrote: > > >> > > >>> When using JsonLoader with Pig 0.10.0 > > >>> > > >>> if I have an input.json file that looks like this: > > >>> > > >>> {"date": "2007-08-25", "id": 16} > > >>> {"date": "2007-09-08", "id": 17} > > >>> {"date": "2007-09-15", "id": 18} > > >>> > > >>> And I use > > >>> > > >>> a = LOAD 'input.json' USING JsonLoader('id:int,date:chararray'); > > >>> DUMP a; > > >>> > > >>> I get errors when it tries to force the date fields into an integer. > > >>> > > >>> Shouldn't this work independent of the ordering of the schema fields? > > >>> Json writers generally don't make guarantees about the ordering. > > >>> > > >>> One alternative (though annoying) would to be use elephant bird > > >>> instead, but I can't get that to compile against hadoop 2.0.0 and Pig > > >>> 0.10.0. > > >>> > > >>> ~Tim > > >> > > > > > |