|
|
-
Weird problem in Pig 0.10 with STOR'ing JSON and then LOADing it as PigStorage chararrayRussell Jurney 2012-06-22, 23:05
The script that has worked in the past is thus:
/* Load Avro emails */ emails = load '/me/tmp/emails_big' using AvroStorage(); emails = filter emails by message_id IS NOT NULL; /* JSONify the emails for ElasticSearch */ store emails into '/tmp/emails.json' using JsonStorage(); /* LOAD JSON as single field for storage in ElasticSearch with Wonderpig */ json_emails = load '/tmp/emails.json' using PigStorage() AS (json_record:chararray); store json_emails into 'es://email/email?id=message_id&json=true&size=1000' using ElasticSearch(); Now I get this error: grunt> json_emails = load '/tmp/emails.json' AS (json_record:chararray); 2012-06-22 15:45:34,136 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema: left is "json_record:chararray", right is "message_id:chararray,thread_id:chararray,in_reply_to:chararray,subject:chararray,body:chararray,date:chararray,froms:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},tos:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},ccs:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},bccs:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},reply_tos:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)}" 2012-06-22 15:45:34,136 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1031: Incompatable schema: left is "json_record:chararray", right is "message_id:chararray,thread_id:chararray,in_reply_to:chararray,subject:chararray,body:chararray,date:chararray,froms:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},tos:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},ccs:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},bccs:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},reply_tos:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)}" at org.apache.pig.newplan.logical.relational.LogicalSchema.merge(LogicalSchema.java:760) at org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:114) at org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100) at org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:219) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57) at org.apache.pig.PigServer$Graph.compile(PigServer.java:1635) at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1566) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538) at org.apache.pig.PigServer.registerQuery(PigServer.java:540) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) at org.apache.pig.Main.run(Main.java:490) at org.apache.pig.Main.main(Main.java:111) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) I tried copying the file from /tmp/emails.json to /tmp/json_emails and loading it then - but that doesn't work. I tried calling PigStorage() explicitly, and that doesn't work either. How am I supposed to pull this off? I figured it out: grunt> rm /tmp/emails.json/.pig_header grunt> rm /tmp/emails.json/.pig_schema Then I can load my JSON as chararray. Interesting problem. -- Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com |