Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Parsing a Complex JSON String?

Copy link to this message
Re: Parsing a Complex JSON String?
Eli Finkelshteyn 2013-03-01, 22:37
Hi Harsha,
Those functions look potentially awesome, but there doesn't seem to be much documentation on which to use for what. I've tried to parse my JSON with both JsonTupleMap and JsonMap, and get a com/fasterxml/jackson/core/JsonParseException with both… I was just running:

grunt> REGISTER '/path/to/elephant-bird-pig-3.0.3-SNAPSHOT.jar';
grunt> REGISTER '/path/to/json-simple-1.1.1.jar';
grunt> REGISTER '/path/to/piggybank.jar';
grunt> REGISTER '/path/to/joda-time-2.1.jar';
grunt> REGISTER '/path/to/akela-0.5-SNAPSHOT.jar';
grunt> DEFINE JsonStringToMap com.twitter.elephantbird.pig.piggybank.JsonStringToMap();
grunt> DEFINE JsonTupleMap com.mozilla.pig.eval.json.JsonTupleMap();
grunt> loaded = LOAD '/path/to/test-files/*' AS (date:chararray, source:chararray, json_string:chararray);
grunt> jsonified = FOREACH loaded GENERATE JsonTupleMap(json_string) AS json, date, source;  
2013-03-01 14:28:29,485 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. com/fasterxml/jackson/core/JsonParseException

Any ideas?


On Feb 28, 2013, at 1:44 PM, Harsha wrote:

> Hi Eli,  
>     Take a look at these  
> https://github.com/mozilla-metrics/akela/tree/master/src/main/java/com/mozilla/pig/eval/json. We use it to parse a complex json objects.
> Thanks,
> Harsha
> On Thursday, February 28, 2013 at 10:44 AM, Eli Finkelshteyn wrote:
>> Hi Folks,
>> I want to parse a string of complex JSON in Pig. Specifically, I want Pig to understand my JSON array as a bag instead of as a single chararray. When using JsonLoader, I can do this easily by specifying the schema, as in this question (http://stackoverflow.com/questions/14094768/parsing-complex-json-with-pig). Is there any way to either have Pig figure out my schema for me, or to specify it when Pig is parsing a string? I've been using JsonStringToMap, but can't find a way to specify Schema, or to have it properly understand my JSON array is an array and not a single char array. I looked at the code in JsonStringToMap, and it looks like it always specifies the schema for me as just a map of chararrays, which won't work for anything but the simplest JSON of a form like {string: string…}. Any ideas?
>> Eli
>> Attachments:  
>> - smime.p7s