Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Weird problem in Pig 0.10 with STOR'ing JSON and then LOADing it as PigStorage chararray


Copy link to this message
-
Weird problem in Pig 0.10 with STOR'ing JSON and then LOADing it as PigStorage chararray
The script that has worked in the past is thus:

/* Load Avro emails */
emails = load '/me/tmp/emails_big' using AvroStorage();
emails = filter emails by message_id IS NOT NULL;

/* JSONify the emails for ElasticSearch */
store emails into '/tmp/emails.json' using JsonStorage();

/* LOAD JSON as single field for storage in ElasticSearch with Wonderpig */
json_emails = load '/tmp/emails.json' using PigStorage() AS
(json_record:chararray);
store json_emails into 'es://email/email?id=message_id&json=true&size=1000'
using ElasticSearch();
Now I get this error:

grunt> json_emails = load '/tmp/emails.json' AS (json_record:chararray);

2012-06-22 15:45:34,136 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1031: Incompatable schema: left is "json_record:chararray", right is
"message_id:chararray,thread_id:chararray,in_reply_to:chararray,subject:chararray,body:chararray,date:chararray,froms:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},tos:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},ccs:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},bccs:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},reply_tos:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)}"
2012-06-22 15:45:34,136 [main] ERROR org.apache.pig.tools.grunt.Grunt -
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1031:
Incompatable schema: left is "json_record:chararray", right is
"message_id:chararray,thread_id:chararray,in_reply_to:chararray,subject:chararray,body:chararray,date:chararray,froms:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},tos:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},ccs:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},bccs:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},reply_tos:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)}"
at
org.apache.pig.newplan.logical.relational.LogicalSchema.merge(LogicalSchema.java:760)
at
org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:114)
at
org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
at org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:219)
at
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at
org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57)
at org.apache.pig.PigServer$Graph.compile(PigServer.java:1635)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1566)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:490)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
I tried copying the file from /tmp/emails.json to /tmp/json_emails and
loading it then - but that doesn't work.  I tried calling PigStorage()
explicitly, and that doesn't work either.

How am I supposed to pull this off?

I figured it out:

grunt> rm /tmp/emails.json/.pig_header
grunt> rm /tmp/emails.json/.pig_schema

Then I can load my JSON as chararray.  Interesting problem.

--
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB