Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Reading json file.


I've had poor experiences getting the default json loaders to work as well.
 I would highly recommend writing your own UDF JsonLoader extending
LoadFunc over, say, importing twitter's elephantbird. A couple of ideas
here:

   - Use TextLoader<https://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/TextLoader.java>as
an example to learn how the
   LoadFunc<https://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/LoadFunc.java?view=markup>abstract
class is implemented well
   - With unit testing, there's a guarantee that the json parsing will be
   performed exactly as desired
   - Also look into
PigStorage<https://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/PigStorage.java>do
get an idea of how to extend FileInputLoadFunc and implement
   StoreFuncInterface well

The learning exercise on its own is the valuable part here imho. It'll
allow for more agile development going forward with future projects, with
only the sunk cost of a few days of research and development.

Hope this helps.

-Dan

On Fri, Aug 30, 2013 at 10:03 AM, Zhu Wayne <[EMAIL PROTECTED]> wrote:

> try twitter's jsonloader.
>
>
>
> On Fri, Aug 30, 2013 at 2:20 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]
> >wrote:
>
> > Hi,
> >
> > There are different json loaders available, but none of them worked for
> me
> > when I had to deal with json. I ended up loading the file as text file,
> > reading one line at a time and then I parsed json inside my UDF with a
> json
> > java library
> >
> > Best Regards,
> > Ruslan
> >
> >
> > On Fri, Aug 30, 2013 at 2:53 AM, jamal sasha <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Umm.. I am trying .. but somehow i am not able to get my head around
> > this:
> > > a = load 'sample_json.json' using
> > > JsonLoader('id:chararray,categories:[chararray], hostt:{ (variable_a:
> > > {(first:int,last:int)})}, ns:[chararray],rep:chararray  ');
> > >
> > > But i get this error:
> > > org.codehaus.jackson.JsonParseException: Unexpected character ('D'
> (code
> > > 68)): expected a valid value (number, String, array, object, 'true',
> > > 'false' or 'null')
> > >  at [Source: java.io.ByteArrayInputStream@4795b8e9; line: 1, column:
> 50]
> > > at
> org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1291)
> > > at
> > >
> > >
> >
> org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
> > > at
> > >
> > >
> >
> org.codehaus.jackson.impl.JsonParserMinimalBase._reportUnexpectedChar(JsonParserMinimalBase.java:306)
> > > at
> > >
> > >
> >
> org.codehaus.jackson.impl.Utf8StreamParser._handleUnexpectedValue(Utf8StreamParser.java:1582)
> > > at
> > >
> > >
> >
> org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:386)
> > > at org.apache.pig.builtin.JsonLoader.readField(JsonLoader.java:173)
> > > at org.apache.pig.builtin.JsonLoader.getNext(JsonLoader.java:157)
> > > at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
> > > at
> > >
> > >
> >
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> > > at
> > org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
> > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> > > at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> > >
> > >
> > >
> > > On Thu, Aug 29, 2013 at 3:22 PM, Shahab Yunus <[EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Have you seen these?
> > > >
> > > >
> > >
> >
> http://pig.apache.org/docs/r0.11.0/api/org/apache/pig/builtin/JsonStorage.html
> > > >
> > > > http://hortonworks.com/blog/jsonize-anything-in-pig-with-tojson/
> > > >
> > > > Regards,
> > > > Shahab
> > > >
> > > >
> > > > On Thu, Aug 29, 2013 at 6:19 PM, jamal sasha <[EMAIL PROTECTED]>
> > > > wrote: