Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> How do I load JSON in Pig?


+
Russell Jurney 2012-11-17, 22:09
+
Dan Young 2012-11-18, 01:23
Copy link to this message
-
Re: How do I load JSON in Pig?
keep calm
and use elephant-bird
https://github.com/kevinweil/elephant-bird<https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/load/JsonLoader.java>

I posted here yesterday an example how to load tweets in json
here goes again. I hope it helps.

  register 'elephant-bird-core-3.0.0.jar'
    register 'elephant-bird-pig-3.0.0.jar'
    register 'google-collections-1.0.jar'
    register 'json-simple-1.1.jar'

    json_lines = LOAD
'/twitter_data/tweets/stream/v1/json/2012_10_10/08' USING
com.twitter.elephantbird.pig.load.JsonLoader();

    geo_tweets = FOREACH json_lines GENERATE (CHARARRAY) $0#'id' AS
id, (CHARARRAY) $0#'geoLocation' AS geoLocation;

    only_not_nulls = FILTER geo_tweets BY geoLocation is not null;
    store only_not_nulls into '/twitter_data/results/geo_tweets';

Arian Rodrigo Pasquali
FEUP, SAPO Labs
http://www.arianpasquali.com
twitter @arianpasquali

2012/11/18 Dan Young <[EMAIL PROTECTED]>

> No sure if this helps, but in 0.11 I've been using this on EMR for some of
> our JSON data....
>
> raw = load 'hdfs:///cleaned_logs/clicks2/$year_id/$month_id/part-*' USING
>
> JsonLoader('a:chararray,at:chararray,c1:(url:chararray,useragent:chararray,referrer:chararray,window:(innerheight:chararray,innerwidth:chararray,outerheight:chararray,outerwidth:chararray),resolution:(height:chararray,width:chararray)),cst:chararray,d:(a:chararray,b:chararray),i:chararray,id:chararray,ip:chararray,k:chararray,l:(lat:chararray,lng:chararray),p:chararray,pv:chararray,sa:chararray,sid:chararray,sst:chararray,t:chararray,uuid:chararray,v:chararray');
>
>
> Regards,
>
> Dano
>
>
>
> On Sat, Nov 17, 2012 at 3:09 PM, Russell Jurney <[EMAIL PROTECTED]
> >wrote:
>
> > I have some JSON data with a uniform schema. I want to load it in Pig.
> > JsonStorage doesn't work, because the data has no schema.
> >
> > How can I load JSON data in Pig?
> >
> > --
> > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED]
> > datasyndrome.com
> >
>
+
Russell Jurney 2012-11-18, 04:32
+
Russell Jurney 2012-11-18, 17:19
+
Arian Pasquali 2012-11-18, 22:46
+
Arian Pasquali 2012-11-19, 00:31
+
Russell Jurney 2012-11-19, 16:23
+
Russell Jurney 2012-11-19, 19:27
+
Russell Jurney 2012-11-19, 19:30
+
Russell Jurney 2012-11-19, 19:33
+
Russell Jurney 2012-11-19, 19:35
+
Deepak Tiwari 2012-11-19, 20:22
+
Saxifrage Cucvara 2012-11-21, 05:56
+
David LaBarbera 2012-11-21, 14:25
+
Saxifrage Cucvara 2012-11-21, 22:36
+
Adam Kawa 2012-11-17, 23:40
+
Russell Jurney 2012-11-18, 22:46
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB