Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> JsonLoader schema field order shouldn't matter


Copy link to this message
-
Re: JsonLoader schema field order shouldn't matter
This seems like a bug to me. It makes it risky to work with JSON data
generated by something other than Pig since the ordering might change.
What do you think?

I didn't see a bug for it in Jira, so would this (still open) one be
the place to mention it? Or should I make a new one?
https://issues.apache.org/jira/browse/PIG-1914

~T
On 7 January 2013 20:24, Alan Gates <[EMAIL PROTECTED]> wrote:
> Currently the JsonLoader does assume ordering of the fields.  It does not do any name matching against the given schema to find the right field.
>
> Alan.
>
> On Jan 7, 2013, at 11:56 AM, Tim Sell wrote:
>
>> When using JsonLoader with Pig 0.10.0
>>
>> if I have an input.json file that looks like this:
>>
>> {"date": "2007-08-25", "id": 16}
>> {"date": "2007-09-08", "id": 17}
>> {"date": "2007-09-15", "id": 18}
>>
>> And I use
>>
>> a = LOAD 'input.json' USING JsonLoader('id:int,date:chararray');
>> DUMP a;
>>
>> I get errors when it tries to force the date fields into an integer.
>>
>> Shouldn't this work independent of the ordering of the schema fields?
>> Json writers generally don't make guarantees about the ordering.
>>
>> One alternative (though annoying) would to be use elephant bird
>> instead, but I can't get that to compile against hadoop 2.0.0 and Pig
>> 0.10.0.
>>
>> ~Tim
>