Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How do I load JSON in Pig?


Copy link to this message
-
Re: How do I load JSON in Pig?
I also ran into same dilemma..here is something that I found easier and
working for me .. I compiled some sources from http://www.json.org/java/
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.util.List;

import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.TupleFactory;
import org.json.JSONArray;
import org.json.JSONException;
import org.json.JSONObject;
public class JsonParser extends EvalFunc<Tuple> {
    @Override
    public Tuple exec(Tuple input) throws IOException {
        TupleFactory tf = TupleFactory.getInstance();
        Tuple t = tf.newTuple();
        if ( input.get(0) != null ){
            String inString = (String) input.get(0);
            try {
                JSONObject jsn = new JSONObject(inString);
                t.append(getJsonArr(jsn));
                    } catch (JSONException e) {

                e.printStackTrace();

            }
        }
        return t;
    }

    private String getJsonArr(JSONObject jsn) {
        String jsnArrVal = "";

        try {
            if (!jsn.has("jsonKey"))
                return null;
            JSONArray jTagArray = jsn.getJSONArray("jsonKey");
            for (int i=0; i<jTagArray.length(); i++){
                JSONObject hst = jTagArray.getJSONObject(i);
                String jsnArrVal = hst.getString("text") + jsnArrVal;
            }
        } catch (JSONException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return jsnArrVal;
    }
}
On Mon, Nov 19, 2012 at 11:35 AM, Russell Jurney
<[EMAIL PROTECTED]>wrote:

> Ok, its even worse. My data is a big array.
>
> Am I being negative in saying that JSON and Pig is like a nightmare?
>
>
> On Mon, Nov 19, 2012 at 2:33 PM, Russell Jurney <[EMAIL PROTECTED]
> >wrote:
>
> > Wait... com.twitter.elephantbird.pig.load.JsonLoader() does not infer the
> > schema from a record. This is what I was looking for. Looks like I have
> to
> > write that myself.
> >
> > And yes, I understand the tradeoffs in doing so. Assuming a sample is the
> > overall schema is a big assumption.
> >
> >
> >
> > On Mon, Nov 19, 2012 at 2:30 PM, Russell Jurney <
> [EMAIL PROTECTED]>wrote:
> >
> >> Talking to myself... never mind, guava and json-simple are included with
> >> Pig.
> >>
> >>
> >> On Mon, Nov 19, 2012 at 2:27 PM, Russell Jurney <
> [EMAIL PROTECTED]
> >> > wrote:
> >>
> >>> Got it building. Are google collections and json-simple external deps?
> >>>
> >>>
> >>> On Mon, Nov 19, 2012 at 11:23 AM, Russell Jurney <
> >>> [EMAIL PROTECTED]> wrote:
> >>>
> >>>> It seems that everyone can build elephant-bird but me:
> >>>> https://github.com/kevinweil/elephant-bird/issues/272
> >>>>
> >>>>
> >>>> On Sun, Nov 18, 2012 at 7:31 PM, Arian Pasquali <
> >>>> [EMAIL PROTECTED]> wrote:
> >>>>
> >>>>> I dont think you really need to build it.
> >>>>> you can find it at any maven repository.
> >>>>>
> >>>>> Arian Rodrigo Pasquali
> >>>>> FEUP, SAPO Labs
> >>>>> http://www.arianpasquali.com
> >>>>> twitter @arianpasquali
> >>>>>
> >>>>>
> >>>>>
> >>>>> 2012/11/18 Arian Pasquali <[EMAIL PROTECTED]>
> >>>>>
> >>>>> > U dont need to build neither
> >>>>> > Just download those two jar I used in my example.
> >>>>> >
> >>>>> > Arian
> >>>>> >
> >>>>> > Em domingo, 18 de novembro de 2012, Russell Jurney escreveu:
> >>>>> >
> >>>>> >> Thanks - looks like I don't have to specify the schema, which is
> >>>>> good.
> >>>>> >>
> >>>>> >> I'll try and build elephant-bird.
> >>>>> >>
> >>>>> >> Russell Jurney http://datasyndrome.com
> >>>>> >>
> >>>>> >> On Nov 17, 2012, at 9:30 PM, Arian Pasquali <
> >>>>> [EMAIL PROTECTED]>
> >>>>> >> wrote:
> >>>>> >>
> >>>>> >> > keep calm
> >>>>> >> > and use elephant-bird
> >>>>> >> > https://github.com/kevinweil/elephant-bird<
> >>>>> >>
> >>>>>
> https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/load/JsonLoader.java
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB