Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Use Pig to parse JSON objects

Copy link to this message
Re: Use Pig to parse JSON objects
Ryan Compton 2013-05-22, 22:37
I've been using twitter's elephantbird and have been very happy with
it so far. Here's an example of parsing a nested json with it:

json_eb = LOAD '$IN_DIRS' USING
com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') as

--parse json with twitter's library
parsed0 = FOREACH json_eb GENERATE  STRSPLIT(json#'id',':').$2 AS
tweetId:chararray, STRSPLIT(json#'actor'#'id',':').$2 AS
userId:chararray, json#'postedTime' AS postedTime:chararray,
json#'twitter_entities'#'urls' AS
On Wed, May 22, 2013 at 10:01 AM, Thomas Edison
> Hi all,
> I have a two fields in my pig input file.  Let's say product_id and
> description.  Description is a JSON objects that actually describes the
> product.
> Is there anything in Pig other than writing a custom UDF to parse the JSON
> object so that I can have some like product_id, product_property,
> product_property_value?  Product_property and product_value are parsed from
> the description JSON object.  Also one product could have multiple
> product_property.
> Thanks.
> T.E.