Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How do I load JSON in Pig?


Copy link to this message
-
Re: How do I load JSON in Pig?
Wait... com.twitter.elephantbird.pig.load.JsonLoader() does not infer the
schema from a record. This is what I was looking for. Looks like I have to
write that myself.

And yes, I understand the tradeoffs in doing so. Assuming a sample is the
overall schema is a big assumption.

On Mon, Nov 19, 2012 at 2:30 PM, Russell Jurney <[EMAIL PROTECTED]>wrote:

> Talking to myself... never mind, guava and json-simple are included with
> Pig.
>
>
> On Mon, Nov 19, 2012 at 2:27 PM, Russell Jurney <[EMAIL PROTECTED]>wrote:
>
>> Got it building. Are google collections and json-simple external deps?
>>
>>
>> On Mon, Nov 19, 2012 at 11:23 AM, Russell Jurney <
>> [EMAIL PROTECTED]> wrote:
>>
>>> It seems that everyone can build elephant-bird but me:
>>> https://github.com/kevinweil/elephant-bird/issues/272
>>>
>>>
>>> On Sun, Nov 18, 2012 at 7:31 PM, Arian Pasquali <[EMAIL PROTECTED]
>>> > wrote:
>>>
>>>> I dont think you really need to build it.
>>>> you can find it at any maven repository.
>>>>
>>>> Arian Rodrigo Pasquali
>>>> FEUP, SAPO Labs
>>>> http://www.arianpasquali.com
>>>> twitter @arianpasquali
>>>>
>>>>
>>>>
>>>> 2012/11/18 Arian Pasquali <[EMAIL PROTECTED]>
>>>>
>>>> > U dont need to build neither
>>>> > Just download those two jar I used in my example.
>>>> >
>>>> > Arian
>>>> >
>>>> > Em domingo, 18 de novembro de 2012, Russell Jurney escreveu:
>>>> >
>>>> >> Thanks - looks like I don't have to specify the schema, which is
>>>> good.
>>>> >>
>>>> >> I'll try and build elephant-bird.
>>>> >>
>>>> >> Russell Jurney http://datasyndrome.com
>>>> >>
>>>> >> On Nov 17, 2012, at 9:30 PM, Arian Pasquali <[EMAIL PROTECTED]
>>>> >
>>>> >> wrote:
>>>> >>
>>>> >> > keep calm
>>>> >> > and use elephant-bird
>>>> >> > https://github.com/kevinweil/elephant-bird<
>>>> >>
>>>> https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/load/JsonLoader.java
>>>> >> >
>>>> >> >
>>>> >> > I posted here yesterday an example how to load tweets in json
>>>> >> > here goes again. I hope it helps.
>>>> >> >
>>>> >> >  register 'elephant-bird-core-3.0.0.jar'
>>>> >> >    register 'elephant-bird-pig-3.0.0.jar'
>>>> >> >    register 'google-collections-1.0.jar'
>>>> >> >    register 'json-simple-1.1.jar'
>>>> >> >
>>>> >> >    json_lines = LOAD
>>>> >> > '/twitter_data/tweets/stream/v1/json/2012_10_10/08' USING
>>>> >> > com.twitter.elephantbird.pig.load.JsonLoader();
>>>> >> >
>>>> >> >    geo_tweets = FOREACH json_lines GENERATE (CHARARRAY) $0#'id' AS
>>>> >> > id, (CHARARRAY) $0#'geoLocation' AS geoLocation;
>>>> >> >
>>>> >> >    only_not_nulls = FILTER geo_tweets BY geoLocation is not null;
>>>> >> >    store only_not_nulls into '/twitter_data/results/geo_tweets';
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > Arian Rodrigo Pasquali
>>>> >> > FEUP, SAPO Labs
>>>> >> > http://www.arianpasquali.com
>>>> >> > twitter @arianpasquali
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > 2012/11/18 Dan Young <[EMAIL PROTECTED]>
>>>> >> >
>>>> >> >> No sure if this helps, but in 0.11 I've been using this on EMR for
>>>> >> some of
>>>> >> >> our JSON data....
>>>> >> >>
>>>> >> >> raw = load
>>>> 'hdfs:///cleaned_logs/clicks2/$year_id/$month_id/part-*'
>>>> >> USING
>>>> >> >>
>>>> >> >>
>>>> >>
>>>> JsonLoader('a:chararray,at:chararray,c1:(url:chararray,useragent:chararray,referrer:chararray,window:(innerheight:chararray,innerwidth:chararray,outerheight:chararray,outerwidth:chararray),resolution:(height:chararray,width:chararray)),cst:chararray,d:(a:chararray,b:chararray),i:chararray,id:chararray,ip:chararray,k:chararray,l:(lat:chararray,lng:chararray),p:chararray,pv:chararray,sa:chararray,sid:chararray,sst:chararray,t:chararray,uuid:chararray,v:chararray');
>>>> >> >>
>>>> >> >>
>>>> >> >> Regards,
>>>> >> >>
>>>> >> >> Dano
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> On Sat, Nov 17, 2012 at 3:09 PM, Russell Jurney <
>>>> >> [EMAIL PROTECTED]
>>>> >> >>> wrote:
>>>> >> >>
>>>> >> >>> I have some JSON data with a uniform schema. I want to load it

Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com