Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Nested JSON Strings - How to Ingest and Manipulate?


Copy link to this message
-
Re: Nested JSON Strings - How to Ingest and Manipulate?
Anurag - once you have elephant-bird parse the JSON into maps, then you
extract the nested JSON elements just as you would with any another map,
using the '#' projection operator.  In other words, the following generates
3-element tuples containing id, name, and link:

A = LOAD ...
B = FOREACH A GENERATE
com.twitter.elephantbird.pig.piggybank.JsonStringToMap(line) as json;
C = FOREACH B GENERATE json#'id', json#'name', json#'link';

Norbert

On Wed, Apr 4, 2012 at 6:44 PM, Anurag Gulati <[EMAIL PROTECTED]>wrote:

> One error in my original message:
>
> I've been able to use ElephantBird to ingest the data ... but the deep
> nested structures are converted into MAPS (not bags) ... which I'm having a
> hard time working with.
>
>
> Thx!
>
> -----Original Message-----
> From: Anurag Gulati
> Sent: Wednesday, April 04, 2012 3:38 PM
> To: [EMAIL PROTECTED]
> Subject: Nested JSON Strings - How to Ingest and Manipulate?
>
> Hi Guys!!
>
> I'm over here trying to get my feet wet with Hadoop and my first task just
> happens to be a complex one.
> I was hoping you could help me out.
>
> I'm trying to read nested JSON structures (data received from Facebook)
> into Pig; then I'd like to be able to manipulate the data (eg. Return all
> lines where Hometown = phoenix,Arizona).
>
> I have a single file with multiple lines of JSON.  Each line is a singular
> entry.  An Example of one line is below:
>
> {"id":"10011666","name":"Test
> user","first_name":"Test","last_name":"user","link":"http://
> www.facebook.com\/test.user","username":"test.user","birthday":"09\/19\/1983","hometown":{"id":"103102203064024","name":"West
> Chester, Pennsylvania"},"location":{"id":"","name":null},"bio":"This is my
> Bio. I'm a geek that love to hack (in a good way)","quotes":"I like quotes.
> But I'm shortening this section cuz it was
> wild!","work":[{"employer":{"id":"6185812851","name":"American
> Eagle"},"location":{"id":"105540216147364","name":"Phoenix,
> Arizona"},"position":{"id":"133619273341785","name":"Counter
> Guy"},"start_date":"2012-01"},{"employer":{"id":"190876464341724","name":"Cardiac
> group"},"position":{"id":"105630109469647","name":"Executive
> Producer"},"description":"We create music for Artist Placement and
> TV\/Film.","start_date":"2002-01"},{"employer":{"id":"6185812851","name":"American
> Eagle"},"location":{"id":"105540216147364","name":"Phoenix,
> Arizona"},"position":{"id":"116439401740213","name":"Floor
> Guy"},"start_date":"2007-10","end_date":"2012-01"},{"employer":{"id":"110067355684846","name":"Saint
> Joseph Hospital"},"location":{"id":"105540216147364","name":"Phoenix,
> Arizona"},"position":{"id":"202489236428627","name":"Pharmacy IT
> Coordinator"},"start_date":"2005-10","end_date":"2007-10"},{"employer":{"id":"110067355684846","name":"Saint
> Joseph Hospital"},"location":{"id":"105540216147364","name":"Phoenix,
> Arizona"},"position":{"id":"144703015548786","name":"Pharmacy
> Tech"},"start_date":"2001-02","end_date":"2005-10"}],"sports":[{"id":"108606435830479","name":"Karate"}],"favorite_teams":[{"id":"87169796810","name":"Philadelphia
> Flyers"},{"id":"93625750491","name":"Philadelphia
> Phillies"},{"id":"45898408995","name":"Phoenix
> Suns"},{"id":"120163518021430","name":"Philadelphia
> Eagles"}],"favorite_athletes":[{"id":"77922840249","name":"Steve
> Nash"},{"id":"105590659475179","name":"Wayne
> Gretzky"},{"id":"62975399193","name":"Michael
> Jordan"}],"inspirational_people":[{"id":"106676942701904","name":"Gandhi"}],"education":[{"school":{"id":"109324275761313","name":"Corona
> del Sol High School"},"type":"High
> School"},{"school":{"id":"23680344606","name":"Arizona State
> University"},"type":"College"}],"gender":"male","interested_in":["female"],"relationship_status":"Single","religion":"Hinduism
> (One with all things)","political":"Liberal (Left of
> Center)","email":"app+22c90gj.9hh9d.f7304b58ac646e08b5f0f10a73547e34\
> u0040proxymail.facebook.com","website":"www.slashdot.org\r\
> nwww.gizmodo.com<http://www.slashdot.org/r/nwww.gizmodo.com