Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How do I load JSON in Pig?


Copy link to this message
-
Re: How do I load JSON in Pig?
They come prebuilt? Neat!

Russell Jurney twitter.com/rjurney
On Nov 18, 2012, at 5:31 PM, Arian Pasquali <[EMAIL PROTECTED]> wrote:

> U dont need to build neither
> Just download those two jar I used in my example.
>
> Arian
>
> Em domingo, 18 de novembro de 2012, Russell Jurney escreveu:
>
>> Thanks - looks like I don't have to specify the schema, which is good.
>> I'll try and build elephant-bird.
>>
>> Russell Jurney http://datasyndrome.com
>>
>> On Nov 17, 2012, at 9:30 PM, Arian Pasquali <[EMAIL PROTECTED]<javascript:;>>
>> wrote:
>>
>>> keep calm
>>> and use elephant-bird
>>> https://github.com/kevinweil/elephant-bird<
>> https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/load/JsonLoader.java
>>>
>>>
>>> I posted here yesterday an example how to load tweets in json
>>> here goes again. I hope it helps.
>>>
>>> register 'elephant-bird-core-3.0.0.jar'
>>>   register 'elephant-bird-pig-3.0.0.jar'
>>>   register 'google-collections-1.0.jar'
>>>   register 'json-simple-1.1.jar'
>>>
>>>   json_lines = LOAD
>>> '/twitter_data/tweets/stream/v1/json/2012_10_10/08' USING
>>> com.twitter.elephantbird.pig.load.JsonLoader();
>>>
>>>   geo_tweets = FOREACH json_lines GENERATE (CHARARRAY) $0#'id' AS
>>> id, (CHARARRAY) $0#'geoLocation' AS geoLocation;
>>>
>>>   only_not_nulls = FILTER geo_tweets BY geoLocation is not null;
>>>   store only_not_nulls into '/twitter_data/results/geo_tweets';
>>>
>>>
>>>
>>> Arian Rodrigo Pasquali
>>> FEUP, SAPO Labs
>>> http://www.arianpasquali.com
>>> twitter @arianpasquali
>>>
>>>
>>>
>>> 2012/11/18 Dan Young <[EMAIL PROTECTED] <javascript:;>>
>>>
>>>> No sure if this helps, but in 0.11 I've been using this on EMR for some
>> of
>>>> our JSON data....
>>>>
>>>> raw = load 'hdfs:///cleaned_logs/clicks2/$year_id/$month_id/part-*'
>> USING
>>>>
>>>>
>> JsonLoader('a:chararray,at:chararray,c1:(url:chararray,useragent:chararray,referrer:chararray,window:(innerheight:chararray,innerwidth:chararray,outerheight:chararray,outerwidth:chararray),resolution:(height:chararray,width:chararray)),cst:chararray,d:(a:chararray,b:chararray),i:chararray,id:chararray,ip:chararray,k:chararray,l:(lat:chararray,lng:chararray),p:chararray,pv:chararray,sa:chararray,sid:chararray,sst:chararray,t:chararray,uuid:chararray,v:chararray');
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Dano
>>>>
>>>>
>>>>
>>>> On Sat, Nov 17, 2012 at 3:09 PM, Russell Jurney <
>> [EMAIL PROTECTED] <javascript:;>
>>>>> wrote:
>>>>
>>>>> I have some JSON data with a uniform schema. I want to load it in Pig.
>>>>> JsonStorage doesn't work, because the data has no schema.
>>>>>
>>>>> How can I load JSON data in Pig?
>>>>>
>>>>> --
>>>>> Russell Jurney twitter.com/rjurney [EMAIL PROTECTED]<javascript:;>
>>>>> datasyndrome.com
>>>>>
>>>>
>>
>
>
> --
> Sent from Gmail Mobile