Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Nested JSON Strings - How to Ingest and Manipulate?


+
Anurag Gulati 2012-04-04, 22:37
+
Anurag Gulati 2012-04-04, 22:44
Copy link to this message
-
Re: Nested JSON Strings - How to Ingest and Manipulate?
Anurag - once you have elephant-bird parse the JSON into maps, then you
extract the nested JSON elements just as you would with any another map,
using the '#' projection operator.  In other words, the following generates
3-element tuples containing id, name, and link:

A = LOAD ...
B = FOREACH A GENERATE
com.twitter.elephantbird.pig.piggybank.JsonStringToMap(line) as json;
C = FOREACH B GENERATE json#'id', json#'name', json#'link';

Norbert

On Wed, Apr 4, 2012 at 6:44 PM, Anurag Gulati <[EMAIL PROTECTED]>wrote:

> One error in my original message:
>
> I've been able to use ElephantBird to ingest the data ... but the deep
> nested structures are converted into MAPS (not bags) ... which I'm having a
> hard time working with.
>
>
> Thx!
>
> -----Original Message-----
> From: Anurag Gulati
> Sent: Wednesday, April 04, 2012 3:38 PM
> To: [EMAIL PROTECTED]
> Subject: Nested JSON Strings - How to Ingest and Manipulate?
>
> Hi Guys!!
>
> I'm over here trying to get my feet wet with Hadoop and my first task just
> happens to be a complex one.
> I was hoping you could help me out.
>
> I'm trying to read nested JSON structures (data received from Facebook)
> into Pig; then I'd like to be able to manipulate the data (eg. Return all
> lines where Hometown = phoenix,Arizona).
>
> I have a single file with multiple lines of JSON.  Each line is a singular
> entry.  An Example of one line is below:
>
> {"id":"10011666","name":"Test
> user","first_name":"Test","last_name":"user","link":"http://
> www.facebook.com\/test.user","username":"test.user","birthday":"09\/19\/1983","hometown":{"id":"103102203064024","name":"West
> Chester, Pennsylvania"},"location":{"id":"","name":null},"bio":"This is my
> Bio. I'm a geek that love to hack (in a good way)","quotes":"I like quotes.
> But I'm shortening this section cuz it was
> wild!","work":[{"employer":{"id":"6185812851","name":"American
> Eagle"},"location":{"id":"105540216147364","name":"Phoenix,
> Arizona"},"position":{"id":"133619273341785","name":"Counter
> Guy"},"start_date":"2012-01"},{"employer":{"id":"190876464341724","name":"Cardiac
> group"},"position":{"id":"105630109469647","name":"Executive
> Producer"},"description":"We create music for Artist Placement and
> TV\/Film.","start_date":"2002-01"},{"employer":{"id":"6185812851","name":"American
> Eagle"},"location":{"id":"105540216147364","name":"Phoenix,
> Arizona"},"position":{"id":"116439401740213","name":"Floor
> Guy"},"start_date":"2007-10","end_date":"2012-01"},{"employer":{"id":"110067355684846","name":"Saint
> Joseph Hospital"},"location":{"id":"105540216147364","name":"Phoenix,
> Arizona"},"position":{"id":"202489236428627","name":"Pharmacy IT
> Coordinator"},"start_date":"2005-10","end_date":"2007-10"},{"employer":{"id":"110067355684846","name":"Saint
> Joseph Hospital"},"location":{"id":"105540216147364","name":"Phoenix,
> Arizona"},"position":{"id":"144703015548786","name":"Pharmacy
> Tech"},"start_date":"2001-02","end_date":"2005-10"}],"sports":[{"id":"108606435830479","name":"Karate"}],"favorite_teams":[{"id":"87169796810","name":"Philadelphia
> Flyers"},{"id":"93625750491","name":"Philadelphia
> Phillies"},{"id":"45898408995","name":"Phoenix
> Suns"},{"id":"120163518021430","name":"Philadelphia
> Eagles"}],"favorite_athletes":[{"id":"77922840249","name":"Steve
> Nash"},{"id":"105590659475179","name":"Wayne
> Gretzky"},{"id":"62975399193","name":"Michael
> Jordan"}],"inspirational_people":[{"id":"106676942701904","name":"Gandhi"}],"education":[{"school":{"id":"109324275761313","name":"Corona
> del Sol High School"},"type":"High
> School"},{"school":{"id":"23680344606","name":"Arizona State
> University"},"type":"College"}],"gender":"male","interested_in":["female"],"relationship_status":"Single","religion":"Hinduism
> (One with all things)","political":"Liberal (Left of
> Center)","email":"app+22c90gj.9hh9d.f7304b58ac646e08b5f0f10a73547e34\
> u0040proxymail.facebook.com","website":"www.slashdot.org\r\
> nwww.gizmodo.com<http://www.slashdot.org/r/nwww.gizmodo.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB