Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Use Pig to parse JSON objects


+
Thomas Edison 2013-05-22, 17:01
Copy link to this message
-
Re: Use Pig to parse JSON objects
I've been using twitter's elephantbird and have been very happy with
it so far. Here's an example of parsing a nested json with it:

json_eb = LOAD '$IN_DIRS' USING
com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') as
(json:map[]);

--parse json with twitter's library
parsed0 = FOREACH json_eb GENERATE  STRSPLIT(json#'id',':').$2 AS
tweetId:chararray, STRSPLIT(json#'actor'#'id',':').$2 AS
userId:chararray, json#'postedTime' AS postedTime:chararray,
json#'twitter_entities'#'urls' AS
userPostedLinks:bag{T:(urlTypes:map[])};
On Wed, May 22, 2013 at 10:01 AM, Thomas Edison
<[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I have a two fields in my pig input file.  Let's say product_id and
> description.  Description is a JSON objects that actually describes the
> product.
>
> Is there anything in Pig other than writing a custom UDF to parse the JSON
> object so that I can have some like product_id, product_property,
> product_property_value?  Product_property and product_value are parsed from
> the description JSON object.  Also one product could have multiple
> product_property.
>
> Thanks.
>
> T.E.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB