Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How do I load JSON in Pig?


Copy link to this message
-
Re: How do I load JSON in Pig?
Got it building. Are google collections and json-simple external deps?
On Mon, Nov 19, 2012 at 11:23 AM, Russell Jurney
<[EMAIL PROTECTED]>wrote:

> It seems that everyone can build elephant-bird but me:
> https://github.com/kevinweil/elephant-bird/issues/272
>
>
> On Sun, Nov 18, 2012 at 7:31 PM, Arian Pasquali <[EMAIL PROTECTED]>wrote:
>
>> I dont think you really need to build it.
>> you can find it at any maven repository.
>>
>> Arian Rodrigo Pasquali
>> FEUP, SAPO Labs
>> http://www.arianpasquali.com
>> twitter @arianpasquali
>>
>>
>>
>> 2012/11/18 Arian Pasquali <[EMAIL PROTECTED]>
>>
>> > U dont need to build neither
>> > Just download those two jar I used in my example.
>> >
>> > Arian
>> >
>> > Em domingo, 18 de novembro de 2012, Russell Jurney escreveu:
>> >
>> >> Thanks - looks like I don't have to specify the schema, which is good.
>> >>
>> >> I'll try and build elephant-bird.
>> >>
>> >> Russell Jurney http://datasyndrome.com
>> >>
>> >> On Nov 17, 2012, at 9:30 PM, Arian Pasquali <[EMAIL PROTECTED]>
>> >> wrote:
>> >>
>> >> > keep calm
>> >> > and use elephant-bird
>> >> > https://github.com/kevinweil/elephant-bird<
>> >>
>> https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/load/JsonLoader.java
>> >> >
>> >> >
>> >> > I posted here yesterday an example how to load tweets in json
>> >> > here goes again. I hope it helps.
>> >> >
>> >> >  register 'elephant-bird-core-3.0.0.jar'
>> >> >    register 'elephant-bird-pig-3.0.0.jar'
>> >> >    register 'google-collections-1.0.jar'
>> >> >    register 'json-simple-1.1.jar'
>> >> >
>> >> >    json_lines = LOAD
>> >> > '/twitter_data/tweets/stream/v1/json/2012_10_10/08' USING
>> >> > com.twitter.elephantbird.pig.load.JsonLoader();
>> >> >
>> >> >    geo_tweets = FOREACH json_lines GENERATE (CHARARRAY) $0#'id' AS
>> >> > id, (CHARARRAY) $0#'geoLocation' AS geoLocation;
>> >> >
>> >> >    only_not_nulls = FILTER geo_tweets BY geoLocation is not null;
>> >> >    store only_not_nulls into '/twitter_data/results/geo_tweets';
>> >> >
>> >> >
>> >> >
>> >> > Arian Rodrigo Pasquali
>> >> > FEUP, SAPO Labs
>> >> > http://www.arianpasquali.com
>> >> > twitter @arianpasquali
>> >> >
>> >> >
>> >> >
>> >> > 2012/11/18 Dan Young <[EMAIL PROTECTED]>
>> >> >
>> >> >> No sure if this helps, but in 0.11 I've been using this on EMR for
>> >> some of
>> >> >> our JSON data....
>> >> >>
>> >> >> raw = load 'hdfs:///cleaned_logs/clicks2/$year_id/$month_id/part-*'
>> >> USING
>> >> >>
>> >> >>
>> >>
>> JsonLoader('a:chararray,at:chararray,c1:(url:chararray,useragent:chararray,referrer:chararray,window:(innerheight:chararray,innerwidth:chararray,outerheight:chararray,outerwidth:chararray),resolution:(height:chararray,width:chararray)),cst:chararray,d:(a:chararray,b:chararray),i:chararray,id:chararray,ip:chararray,k:chararray,l:(lat:chararray,lng:chararray),p:chararray,pv:chararray,sa:chararray,sid:chararray,sst:chararray,t:chararray,uuid:chararray,v:chararray');
>> >> >>
>> >> >>
>> >> >> Regards,
>> >> >>
>> >> >> Dano
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Sat, Nov 17, 2012 at 3:09 PM, Russell Jurney <
>> >> [EMAIL PROTECTED]
>> >> >>> wrote:
>> >> >>
>> >> >>> I have some JSON data with a uniform schema. I want to load it in
>> Pig.
>> >> >>> JsonStorage doesn't work, because the data has no schema.
>> >> >>>
>> >> >>> How can I load JSON data in Pig?
>> >> >>>
>> >> >>> --
>> >> >>> Russell Jurney twitter.com/rjurney [EMAIL PROTECTED]
>> >> >>> datasyndrome.com
>> >> >>>
>> >> >>
>> >>
>> >
>> >
>> > --
>> > Sent from Gmail Mobile
>> >
>>
>
>
>
> --
> Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.
> com
>

--
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB