Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> How do I load JSON in Pig?


+
Russell Jurney 2012-11-17, 22:09
+
Dan Young 2012-11-18, 01:23
+
Arian Pasquali 2012-11-18, 02:30
+
Russell Jurney 2012-11-18, 04:32
+
Russell Jurney 2012-11-18, 17:19
+
Arian Pasquali 2012-11-18, 22:46
+
Arian Pasquali 2012-11-19, 00:31
+
Russell Jurney 2012-11-19, 16:23
+
Russell Jurney 2012-11-19, 19:27
+
Russell Jurney 2012-11-19, 19:30
+
Russell Jurney 2012-11-19, 19:33
+
Russell Jurney 2012-11-19, 19:35
+
Deepak Tiwari 2012-11-19, 20:22
+
Saxifrage Cucvara 2012-11-21, 05:56
Copy link to this message
-
Re: How do I load JSON in Pig?
Try

com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad')
This should allow access to nested object as nested map ($0#'level1#'level2'#'level3' …)

David

On Nov 21, 2012, at 12:56 AM, Saxifrage Cucvara <[EMAIL PROTECTED]> wrote:

> I'm also experiencing problems working with JSON objects in Pig.
>
> I have managed to load in a log file in JSON format but only query the top
> level objects.  Whenever I try to call anything that is nested it fails.
>
> -- Register JARS
> register elephant-bird-2.2.3.jar;
> register json-simple-1.1.jar;
>
> -- Load data
> nestobject = LOAD '/Users/Path/GoogleDrive/test.json'
>        USING
> com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad=true')
>        AS (json:map[]);
> DUMP nestobject;
>
> -- Example query
> tester = FOREACH nestobject GENERATE json#'event',json#'uid',
> json#'data'#'expired_reason' as reason;
> DUMP tester;
>
> The above fails ...
>
> Does anyone have any ideas?
>
> Thanks
>
> Sax
>
> On 20 November 2012 07:22, Deepak Tiwari <[EMAIL PROTECTED]> wrote:
>
>> I also ran into same dilemma..here is something that I found easier and
>> working for me .. I compiled some sources from http://www.json.org/java/
>>
>>
>> import java.io.IOException;
>> import java.io.UnsupportedEncodingException;
>> import java.util.List;
>>
>> import org.apache.pig.EvalFunc;
>> import org.apache.pig.data.Tuple;
>> import org.apache.pig.data.TupleFactory;
>> import org.json.JSONArray;
>> import org.json.JSONException;
>> import org.json.JSONObject;
>>
>>
>> public class JsonParser extends EvalFunc<Tuple> {
>>    @Override
>>    public Tuple exec(Tuple input) throws IOException {
>>        TupleFactory tf = TupleFactory.getInstance();
>>        Tuple t = tf.newTuple();
>>
>>
>>        if ( input.get(0) != null ){
>>            String inString = (String) input.get(0);
>>            try {
>>                JSONObject jsn = new JSONObject(inString);
>>                t.append(getJsonArr(jsn));
>>                    } catch (JSONException e) {
>>
>>                e.printStackTrace();
>>
>>            }
>>        }
>>        return t;
>>    }
>>
>>    private String getJsonArr(JSONObject jsn) {
>>        String jsnArrVal = "";
>>
>>        try {
>>            if (!jsn.has("jsonKey"))
>>                return null;
>>            JSONArray jTagArray = jsn.getJSONArray("jsonKey");
>>            for (int i=0; i<jTagArray.length(); i++){
>>                JSONObject hst = jTagArray.getJSONObject(i);
>>                String jsnArrVal = hst.getString("text") + jsnArrVal;
>>            }
>>        } catch (JSONException e) {
>>            // TODO Auto-generated catch block
>>            e.printStackTrace();
>>        }
>>        return jsnArrVal;
>>    }
>> }
>>
>>
>> On Mon, Nov 19, 2012 at 11:35 AM, Russell Jurney
>> <[EMAIL PROTECTED]>wrote:
>>
>>> Ok, its even worse. My data is a big array.
>>>
>>> Am I being negative in saying that JSON and Pig is like a nightmare?
>>>
>>>
>>> On Mon, Nov 19, 2012 at 2:33 PM, Russell Jurney <
>> [EMAIL PROTECTED]
>>>> wrote:
>>>
>>>> Wait... com.twitter.elephantbird.pig.load.JsonLoader() does not infer
>> the
>>>> schema from a record. This is what I was looking for. Looks like I have
>>> to
>>>> write that myself.
>>>>
>>>> And yes, I understand the tradeoffs in doing so. Assuming a sample is
>> the
>>>> overall schema is a big assumption.
>>>>
>>>>
>>>>
>>>> On Mon, Nov 19, 2012 at 2:30 PM, Russell Jurney <
>>> [EMAIL PROTECTED]>wrote:
>>>>
>>>>> Talking to myself... never mind, guava and json-simple are included
>> with
>>>>> Pig.
>>>>>
>>>>>
>>>>> On Mon, Nov 19, 2012 at 2:27 PM, Russell Jurney <
>>> [EMAIL PROTECTED]
>>>>>> wrote:
>>>>>
>>>>>> Got it building. Are google collections and json-simple external
>> deps?
>>>>>>
>>>>>>
>>>>>> On Mon, Nov 19, 2012 at 11:23 AM, Russell Jurney <
>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>
>>>>>>> It seems that everyone can build elephant-bird but me:
+
Saxifrage Cucvara 2012-11-21, 22:36
+
Adam Kawa 2012-11-17, 23:40
+
Russell Jurney 2012-11-18, 22:46
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB