Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - How do I load JSON in Pig?


Copy link to this message
-
Re: How do I load JSON in Pig?
Saxifrage Cucvara 2012-11-21, 22:36
Thanks David.

However, I did try this.  I can read things on first level of the JSON file
but anything in any of the nested levels is failing.

Not sure if the below errors help with identifying what the problem might
be:
*012-11-22 09:29:07,065 [Thread-39] WARN
 org.apache.hadoop.mapred.FileOutputCommitter - Output path is null in
cleanup*
*2012-11-22 09:29:07,065 [Thread-39] WARN
 org.apache.hadoop.mapred.LocalJobRunner - job_local_0009*
*org.apache.pig.backend.executionengine.ExecException: ERROR 1081: Cannot
cast to map. Expected bytearray but received: chararray*
* at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:1422)
*
* at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POMapLookUp.processInput(POMapLookUp.java:87)
*
* at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POMapLookUp.getNext(POMapLookUp.java:98)
*
* at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POMapLookUp.getNext(POMapLookUp.java:117)
*
* at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:320)
*
* at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
*
* at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
*
* at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:271)
*
* at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
*
* at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
*
* at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)*
* at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)*
* at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)*
* at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)*
*Caused by: java.lang.ClassCastException*
*2012-11-22 09:29:07,199 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_local_0009*
*2012-11-22 09:29:07,199 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete*
*2012-11-22 09:29:12,207 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_local_0009 has failed! Stop running all dependent jobs*
*2012-11-22 09:29:12,207 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete*
*2012-11-22 09:29:12,207 [main] ERROR
org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!*
On 22 November 2012 01:25, David LaBarbera <[EMAIL PROTECTED]
> wrote:

> Try
>
> com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad')
> This should allow access to nested object as nested map
> ($0#'level1#'level2'#'level3' …)
>
> David
>
> On Nov 21, 2012, at 12:56 AM, Saxifrage Cucvara <
> [EMAIL PROTECTED]> wrote:
>
> > I'm also experiencing problems working with JSON objects in Pig.
> >
> > I have managed to load in a log file in JSON format but only query the
> top
> > level objects.  Whenever I try to call anything that is nested it fails.
> >
> > -- Register JARS
> > register elephant-bird-2.2.3.jar;
> > register json-simple-1.1.jar;
> >
> > -- Load data
> > nestobject = LOAD '/Users/Path/GoogleDrive/test.json'
> >        USING
> > com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad=true')
> >        AS (json:map[]);
> > DUMP nestobject;
> >
> > -- Example query
> > tester = FOREACH nestobject GENERATE json#'event',json#'uid',
> > json#'data'#'expired_reason' as reason;
> > DUMP tester;
> >
> > The above fails ...
> >
> > Does anyone have any ideas?
> >
> > Thanks
> >
> > Sax
> >
> > On 20 November 2012 07:22, Deepak Tiwari <[EMAIL PROTECTED]> wrote:
*Saxifrage Cucvara*
Senior Data Analyst

[image: JBA Digital] <http://www.jbadigital.com/>
*JBA Online Consultancy*

E: [EMAIL PROTECTED]
M: +61 424 622 534
W: www.jbadigital.com
A:  Level 6, 69 Reservoir Street, Surry Hills NSW 2010

The information contained in this email is confidential and is intended for
the use of the individual or entity named above. If the receiver of this
message is not the intended recipient, you are hereby notified that any
dissemination, distribution or copy of this email is strictly prohibited.
If you have received this e-mail in error, please notify our office by
telephone. JB/A and their employees do not represent that this transmission
is free from viruses or other defects and you should see it as your
responsibility to check for viruses and defects. JB/A disclaims any
liability to any person for loss or damage resulting (directly or
indirectly) from the receipt of electronic mail (including enclosures).