|
IGZ Nick
2011-12-13, 10:49
Stan Rosenberg
2011-12-13, 15:35
IGZ Nick
2011-12-13, 18:17
Stan Rosenberg
2011-12-13, 18:53
IGZ Nick
2011-12-13, 19:47
Daniel Dai
2012-01-02, 08:55
Stan Rosenberg
2011-12-13, 20:03
Bill Graham
2011-12-13, 16:59
IGZ Nick
2011-12-13, 18:15
Bill Graham
2011-12-13, 18:51
IGZ Nick
2011-12-13, 19:45
Bill Graham
2011-12-15, 00:17
|
-
Using AvroStorage()IGZ Nick 2011-12-13, 10:49
Hi all,
I want to keep the pig script and storage schema separate. Is it possible to do this in a clean way? THe only way that has worked so far is to do like: AvroStorage('schema', '{"name":"xyz","type":"record","fields":[{"name":"abc","type":"string"}]}'); That too, all the schema in one line. If I split it onto multiple lines, I get a MismatchException (93-3) or something like that. Is there no way to do AvroStorage('file', <hdfs path of schema file>) or something of that sort, or at least be able to specify the schema in multiple lines? Thanks, +
IGZ Nick 2011-12-13, 10:49
-
Re: Using AvroStorage()Stan Rosenberg 2011-12-13, 15:35
The following test script works for me:
============================================ A = load '$LOGS' using org.apache.pig.piggybank.storage.avro.AvroStorage(); describe A; B = foreach A generate region as my_region, google_ip; dump B; store B into './output' using org.apache.pig.piggybank.storage.avro.AvroStorage( '{"debug": 5, "schema": {"type": "record", "name": "test", "fields": [{"name": "my_region", "type": ["null", "string"]}, {"name": "ip", "type": ["null", "string"]}]} }'); ============================================================Note you don't need to pass the first parameter, i.e., 'schema'; you can just pass a string formatted in json. If you're still getting MismatchException, please compile a small repro and send it to the list. stan On Tue, Dec 13, 2011 at 5:49 AM, IGZ Nick <[EMAIL PROTECTED]> wrote: > Hi all, > > I want to keep the pig script and storage schema separate. Is it possible > to do this in a clean way? THe only way that has worked so far is to do > like: > AvroStorage('schema', > '{"name":"xyz","type":"record","fields":[{"name":"abc","type":"string"}]}'); > > That too, all the schema in one line. If I split it onto multiple lines, I > get a MismatchException (93-3) or something like that. Is there no way to > do AvroStorage('file', <hdfs path of schema file>) or something of that > sort, or at least be able to specify the schema in multiple lines? > > Thanks, +
Stan Rosenberg 2011-12-13, 15:35
-
Re: Using AvroStorage()IGZ Nick 2011-12-13, 18:17
Hi Stan,
Here is my pig script: REGISTER avro-1.4.0.jar REGISTER joda-time-1.6.jar REGISTER json-simple-1.1.jar REGISTER jackson-core-asl-1.5.5.jar REGISTER jackson-mapper-asl-1.5.5.jar REGISTER pig-0.9.1-SNAPSHOT.jar REGISTER dwh-udf-0.1.jar REGISTER piggybank.jar REGISTER linkedin-pig-0.8.jar REGISTER google-collect-1.0-rc2.jar; A = LOAD '/user/hshankar/temp' USING PigStorage();RMF '/user/hshankar/out1';STORE A INTO '/user/hshankar/out1' USING org.apache.pig.piggybank.storage.avro.AvroStorage('{"type": "record", "name": "test", "fields": [{"name":"my_region", "type": "string"}]}'); On executing it, I get this error: 2011-12-13 18:16:35,133 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse: MismatchedTokenException(93!=3) Details at logfile: /export/home/hshankar/pig_scripts/pig_1323800194535.log Log file contains: Pig Stack Trace --------------- ERROR 1200: Pig script failed to parse: MismatchedTokenException(93!=3) org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Pig script failed to parse: MismatchedTokenException(93!=3) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1652) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1597) at org.apache.pig.PigServer.registerQuery(PigServer.java:583) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) at org.apache.pig.Main.run(Main.java:553) at org.apache.pig.Main.main(Main.java:108) Caused by: Failed to parse: Pig script failed to parse: MismatchedTokenException(93!=3) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:178) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1644) ... 9 more Caused by: MismatchedTokenException(93!=3) at org.apache.pig.parser.AstValidator.recoverFromMismatchedToken(AstValidator.java:209) at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115) at org.apache.pig.parser.AstValidator.func_clause(AstValidator.java:3497) at org.apache.pig.parser.AstValidator.store_clause(AstValidator.java:4626) at org.apache.pig.parser.AstValidator.op_clause(AstValidator.java:970) at org.apache.pig.parser.AstValidator.general_statement(AstValidator.java:574) at org.apache.pig.parser.AstValidator.statement(AstValidator.java:396) at org.apache.pig.parser.AstValidator.query(AstValidator.java:306) at org.apache.pig.parser.QueryParserDriver.validateAst(QueryParserDriver.java:236) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:168) ... 10 more =============================================================================== On Tue, Dec 13, 2011 at 9:05 PM, Stan Rosenberg < [EMAIL PROTECTED]> wrote: > The following test script works for me: > ============================================> > A = load '$LOGS' using org.apache.pig.piggybank.storage.avro.AvroStorage(); > describe A; > > B = foreach A generate region as my_region, google_ip; > > dump B; > > store B into './output' using > org.apache.pig.piggybank.storage.avro.AvroStorage( > '{"debug": 5, > "schema": {"type": "record", "name": "test", "fields": [{"name": > "my_region", "type": ["null", "string"]}, {"name": "ip", "type": > ["null", "string"]}]} > }'); > ============================================================> Note you don't need to pass the first parameter, i.e., 'schema'; you > can just pass a string formatted in json. > If you're still getting MismatchException, please compile a small > repro and send it to the list. > +
IGZ Nick 2011-12-13, 18:17
-
Re: Using AvroStorage()Stan Rosenberg 2011-12-13, 18:53
There is something syntactically wrong with your script.
MismatchedTokenException seems to indicate that the semicolon character was expected (ttype==93). What happens if you replace the entire "STORE A ..." line by say "DUMP A"? On Tue, Dec 13, 2011 at 1:17 PM, IGZ Nick <[EMAIL PROTECTED]> wrote: > Hi Stan, > > Here is my pig script: > REGISTER avro-1.4.0.jar > REGISTER joda-time-1.6.jar > REGISTER json-simple-1.1.jar > REGISTER jackson-core-asl-1.5.5.jar > REGISTER jackson-mapper-asl-1.5.5.jar > REGISTER pig-0.9.1-SNAPSHOT.jar > REGISTER dwh-udf-0.1.jar > REGISTER piggybank.jar > REGISTER linkedin-pig-0.8.jar > REGISTER google-collect-1.0-rc2.jar; > > A = LOAD '/user/hshankar/temp' USING PigStorage();RMF > '/user/hshankar/out1';STORE A INTO '/user/hshankar/out1' USING > org.apache.pig.piggybank.storage.avro.AvroStorage('{"type": "record", > "name": "test", "fields": [{"name":"my_region", "type": "string"}]}'); > > On executing it, I get this error: > 2011-12-13 18:16:35,133 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 1200: Pig script failed to parse: MismatchedTokenException(93!=3) > Details at logfile: /export/home/hshankar/pig_scripts/pig_1323800194535.log > > Log file contains: > Pig Stack Trace > --------------- > ERROR 1200: Pig script failed to parse: MismatchedTokenException(93!=3) > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error > during parsing. Pig script failed to parse: MismatchedTokenException(93!=3) > at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1652) > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1597) > at org.apache.pig.PigServer.registerQuery(PigServer.java:583) > at > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) > at org.apache.pig.Main.run(Main.java:553) > at org.apache.pig.Main.main(Main.java:108) > Caused by: Failed to parse: Pig script failed to parse: > MismatchedTokenException(93!=3) > at > org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:178) > at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1644) > ... 9 more > Caused by: MismatchedTokenException(93!=3) > at > org.apache.pig.parser.AstValidator.recoverFromMismatchedToken(AstValidator.java:209) > at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115) > at > org.apache.pig.parser.AstValidator.func_clause(AstValidator.java:3497) > at > org.apache.pig.parser.AstValidator.store_clause(AstValidator.java:4626) > at > org.apache.pig.parser.AstValidator.op_clause(AstValidator.java:970) > at > org.apache.pig.parser.AstValidator.general_statement(AstValidator.java:574) > at > org.apache.pig.parser.AstValidator.statement(AstValidator.java:396) > at org.apache.pig.parser.AstValidator.query(AstValidator.java:306) > at > org.apache.pig.parser.QueryParserDriver.validateAst(QueryParserDriver.java:236) > at > org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:168) > ... 10 more > ===============================================================================> > > On Tue, Dec 13, 2011 at 9:05 PM, Stan Rosenberg < > [EMAIL PROTECTED]> wrote: > >> The following test script works for me: >> ============================================>> >> A = load '$LOGS' using org.apache.pig.piggybank.storage.avro.AvroStorage(); >> describe A; >> >> B = foreach A generate region as my_region, google_ip; >> >> dump B; >> >> store B into './output' using >> org.apache.pig.piggybank.storage.avro.AvroStorage( >> '{"debug": 5, >> "schema": {"type": "record", "name": "test", "fields": [{"name": +
Stan Rosenberg 2011-12-13, 18:53
-
Re: Using AvroStorage()IGZ Nick 2011-12-13, 19:47
DUMP works as expected
If I write the exact same thing in one line, it works.. I remember seeing a JIRA for this some time back, but am not able to find it now. On Wed, Dec 14, 2011 at 12:23 AM, Stan Rosenberg < [EMAIL PROTECTED]> wrote: > There is something syntactically wrong with your script. > MismatchedTokenException seems to indicate that the semicolon > character was expected (ttype==93). > What happens if you replace the entire "STORE A ..." line by say "DUMP A"? > > On Tue, Dec 13, 2011 at 1:17 PM, IGZ Nick <[EMAIL PROTECTED]> wrote: > > Hi Stan, > > > > Here is my pig script: > > REGISTER avro-1.4.0.jar > > REGISTER joda-time-1.6.jar > > REGISTER json-simple-1.1.jar > > REGISTER jackson-core-asl-1.5.5.jar > > REGISTER jackson-mapper-asl-1.5.5.jar > > REGISTER pig-0.9.1-SNAPSHOT.jar > > REGISTER dwh-udf-0.1.jar > > REGISTER piggybank.jar > > REGISTER linkedin-pig-0.8.jar > > REGISTER google-collect-1.0-rc2.jar; > > > > A = LOAD '/user/hshankar/temp' USING PigStorage();RMF > > '/user/hshankar/out1';STORE A INTO '/user/hshankar/out1' USING > > org.apache.pig.piggybank.storage.avro.AvroStorage('{"type": "record", > > "name": "test", "fields": [{"name":"my_region", "type": "string"}]}'); > > > > On executing it, I get this error: > > 2011-12-13 18:16:35,133 [main] ERROR org.apache.pig.tools.grunt.Grunt - > > ERROR 1200: Pig script failed to parse: MismatchedTokenException(93!=3) > > Details at logfile: > /export/home/hshankar/pig_scripts/pig_1323800194535.log > > > > Log file contains: > > Pig Stack Trace > > --------------- > > ERROR 1200: Pig script failed to parse: MismatchedTokenException(93!=3) > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error > > during parsing. Pig script failed to parse: > MismatchedTokenException(93!=3) > > at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1652) > > at > org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1597) > > at org.apache.pig.PigServer.registerQuery(PigServer.java:583) > > at > > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942) > > at > > > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) > > at > > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) > > at > > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) > > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) > > at org.apache.pig.Main.run(Main.java:553) > > at org.apache.pig.Main.main(Main.java:108) > > Caused by: Failed to parse: Pig script failed to parse: > > MismatchedTokenException(93!=3) > > at > > org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:178) > > at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1644) > > ... 9 more > > Caused by: MismatchedTokenException(93!=3) > > at > > > org.apache.pig.parser.AstValidator.recoverFromMismatchedToken(AstValidator.java:209) > > at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115) > > at > > org.apache.pig.parser.AstValidator.func_clause(AstValidator.java:3497) > > at > > org.apache.pig.parser.AstValidator.store_clause(AstValidator.java:4626) > > at > > org.apache.pig.parser.AstValidator.op_clause(AstValidator.java:970) > > at > > > org.apache.pig.parser.AstValidator.general_statement(AstValidator.java:574) > > at > > org.apache.pig.parser.AstValidator.statement(AstValidator.java:396) > > at org.apache.pig.parser.AstValidator.query(AstValidator.java:306) > > at > > > org.apache.pig.parser.QueryParserDriver.validateAst(QueryParserDriver.java:236) > > at > > org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:168) > > ... 10 more > > > ===============================================================================> > > > > > On Tue, Dec 13, 2011 at 9:05 PM, Stan Rosenberg < +
IGZ Nick 2011-12-13, 19:47
-
Re: Using AvroStorage()Daniel Dai 2012-01-02, 08:55
The Jira is PIG-1749. It should in trunk as well. Open a ticket if you
cannot make a particular version work. Daniel On Tue, Dec 13, 2011 at 11:47 AM, IGZ Nick <[EMAIL PROTECTED]> wrote: > DUMP works as expected > If I write the exact same thing in one line, it works.. I remember seeing a > JIRA for this some time back, but am not able to find it now. > > On Wed, Dec 14, 2011 at 12:23 AM, Stan Rosenberg < > [EMAIL PROTECTED]> wrote: > > > There is something syntactically wrong with your script. > > MismatchedTokenException seems to indicate that the semicolon > > character was expected (ttype==93). > > What happens if you replace the entire "STORE A ..." line by say "DUMP > A"? > > > > On Tue, Dec 13, 2011 at 1:17 PM, IGZ Nick <[EMAIL PROTECTED]> wrote: > > > Hi Stan, > > > > > > Here is my pig script: > > > REGISTER avro-1.4.0.jar > > > REGISTER joda-time-1.6.jar > > > REGISTER json-simple-1.1.jar > > > REGISTER jackson-core-asl-1.5.5.jar > > > REGISTER jackson-mapper-asl-1.5.5.jar > > > REGISTER pig-0.9.1-SNAPSHOT.jar > > > REGISTER dwh-udf-0.1.jar > > > REGISTER piggybank.jar > > > REGISTER linkedin-pig-0.8.jar > > > REGISTER google-collect-1.0-rc2.jar; > > > > > > A = LOAD '/user/hshankar/temp' USING PigStorage();RMF > > > '/user/hshankar/out1';STORE A INTO '/user/hshankar/out1' USING > > > org.apache.pig.piggybank.storage.avro.AvroStorage('{"type": "record", > > > "name": "test", "fields": [{"name":"my_region", "type": "string"}]}'); > > > > > > On executing it, I get this error: > > > 2011-12-13 18:16:35,133 [main] ERROR org.apache.pig.tools.grunt.Grunt - > > > ERROR 1200: Pig script failed to parse: MismatchedTokenException(93!=3) > > > Details at logfile: > > /export/home/hshankar/pig_scripts/pig_1323800194535.log > > > > > > Log file contains: > > > Pig Stack Trace > > > --------------- > > > ERROR 1200: Pig script failed to parse: MismatchedTokenException(93!=3) > > > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error > > > during parsing. Pig script failed to parse: > > MismatchedTokenException(93!=3) > > > at > org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1652) > > > at > > org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1597) > > > at org.apache.pig.PigServer.registerQuery(PigServer.java:583) > > > at > > > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942) > > > at > > > > > > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) > > > at > > > > > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) > > > at > > > > > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) > > > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) > > > at org.apache.pig.Main.run(Main.java:553) > > > at org.apache.pig.Main.main(Main.java:108) > > > Caused by: Failed to parse: Pig script failed to parse: > > > MismatchedTokenException(93!=3) > > > at > > > > org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:178) > > > at > org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1644) > > > ... 9 more > > > Caused by: MismatchedTokenException(93!=3) > > > at > > > > > > org.apache.pig.parser.AstValidator.recoverFromMismatchedToken(AstValidator.java:209) > > > at > org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115) > > > at > > > org.apache.pig.parser.AstValidator.func_clause(AstValidator.java:3497) > > > at > > > org.apache.pig.parser.AstValidator.store_clause(AstValidator.java:4626) > > > at > > > org.apache.pig.parser.AstValidator.op_clause(AstValidator.java:970) > > > at > > > > > > org.apache.pig.parser.AstValidator.general_statement(AstValidator.java:574) > > > at > > > org.apache.pig.parser.AstValidator.statement(AstValidator.java:396) > > > at > org.apache.pig.parser.AstValidator.query(AstValidator.java:306) +
Daniel Dai 2012-01-02, 08:55
-
Re: Using AvroStorage()Stan Rosenberg 2011-12-13, 20:03
It works for me with 0.9.1'. Not sure what else it could be; '\r' if
you're on windows? Can you confirm that you don't have any funny newline characters, e.g., using 'od -h'. On Tue, Dec 13, 2011 at 2:47 PM, IGZ Nick <[EMAIL PROTECTED]> wrote: > DUMP works as expected > If I write the exact same thing in one line, it works.. I remember seeing a > JIRA for this some time back, but am not able to find it now. > > On Wed, Dec 14, 2011 at 12:23 AM, Stan Rosenberg < > [EMAIL PROTECTED]> wrote: > >> There is something syntactically wrong with your script. >> MismatchedTokenException seems to indicate that the semicolon >> character was expected (ttype==93). >> What happens if you replace the entire "STORE A ..." line by say "DUMP A"? >> >> On Tue, Dec 13, 2011 at 1:17 PM, IGZ Nick <[EMAIL PROTECTED]> wrote: >> > Hi Stan, >> > >> > Here is my pig script: >> > REGISTER avro-1.4.0.jar >> > REGISTER joda-time-1.6.jar >> > REGISTER json-simple-1.1.jar >> > REGISTER jackson-core-asl-1.5.5.jar >> > REGISTER jackson-mapper-asl-1.5.5.jar >> > REGISTER pig-0.9.1-SNAPSHOT.jar >> > REGISTER dwh-udf-0.1.jar >> > REGISTER piggybank.jar >> > REGISTER linkedin-pig-0.8.jar >> > REGISTER google-collect-1.0-rc2.jar; >> > >> > A = LOAD '/user/hshankar/temp' USING PigStorage();RMF >> > '/user/hshankar/out1';STORE A INTO '/user/hshankar/out1' USING >> > org.apache.pig.piggybank.storage.avro.AvroStorage('{"type": "record", >> > "name": "test", "fields": [{"name":"my_region", "type": "string"}]}'); >> > >> > On executing it, I get this error: >> > 2011-12-13 18:16:35,133 [main] ERROR org.apache.pig.tools.grunt.Grunt - >> > ERROR 1200: Pig script failed to parse: MismatchedTokenException(93!=3) >> > Details at logfile: >> /export/home/hshankar/pig_scripts/pig_1323800194535.log >> > >> > Log file contains: >> > Pig Stack Trace >> > --------------- >> > ERROR 1200: Pig script failed to parse: MismatchedTokenException(93!=3) >> > >> > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error >> > during parsing. Pig script failed to parse: >> MismatchedTokenException(93!=3) >> > at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1652) >> > at >> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1597) >> > at org.apache.pig.PigServer.registerQuery(PigServer.java:583) >> > at >> > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942) >> > at >> > >> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) >> > at >> > >> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) >> > at >> > >> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) >> > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) >> > at org.apache.pig.Main.run(Main.java:553) >> > at org.apache.pig.Main.main(Main.java:108) >> > Caused by: Failed to parse: Pig script failed to parse: >> > MismatchedTokenException(93!=3) >> > at >> > org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:178) >> > at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1644) >> > ... 9 more >> > Caused by: MismatchedTokenException(93!=3) >> > at >> > >> org.apache.pig.parser.AstValidator.recoverFromMismatchedToken(AstValidator.java:209) >> > at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115) >> > at >> > org.apache.pig.parser.AstValidator.func_clause(AstValidator.java:3497) >> > at >> > org.apache.pig.parser.AstValidator.store_clause(AstValidator.java:4626) >> > at >> > org.apache.pig.parser.AstValidator.op_clause(AstValidator.java:970) >> > at >> > >> org.apache.pig.parser.AstValidator.general_statement(AstValidator.java:574) >> > at >> > org.apache.pig.parser.AstValidator.statement(AstValidator.java:396) >> > at org.apache.pig.parser.AstValidator.query(AstValidator.java:306) >> > at +
Stan Rosenberg 2011-12-13, 20:03
-
Re: Using AvroStorage()Bill Graham 2011-12-13, 16:59
Yes, you can reference an Avro schema file in HDFS with the "schema_file"
param. See TestAvroStorage.testRecordWithFieldSchemaFromTextWithSchemaFile here for an example: http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java On Tue, Dec 13, 2011 at 2:49 AM, IGZ Nick <[EMAIL PROTECTED]> wrote: > Hi all, > > I want to keep the pig script and storage schema separate. Is it possible > to do this in a clean way? THe only way that has worked so far is to do > like: > AvroStorage('schema', > > '{"name":"xyz","type":"record","fields":[{"name":"abc","type":"string"}]}'); > > That too, all the schema in one line. If I split it onto multiple lines, I > get a MismatchException (93-3) or something like that. Is there no way to > do AvroStorage('file', <hdfs path of schema file>) or something of that > sort, or at least be able to specify the schema in multiple lines? > > Thanks, > +
Bill Graham 2011-12-13, 16:59
-
Re: Using AvroStorage()IGZ Nick 2011-12-13, 18:15
Hi Bill,
I tried schema_file but I get this error: grunt> STORE A INTO '/user/hshankar/out1' USING org.apache.pig.piggybank.storage.avro.AvroStorage ('{"schema_file": "/user/hshankar/schema1.schema"}'); 2011-12-13 18:06:00,879 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments '[{"schema_file": "/user/hshankar/schema1.schema"}]' Details at logfile: /export/home/hshankar/pig_scripts/pig_1323798959597.log This is what the logfile contains: ===============================================================================Pig Stack Trace --------------- ERROR 1200: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments '[{"schema_file": "/user/hshankar/schema1.schema"}]' Failed to parse: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments '[{"schema_file": "/user/hshankar/schema1.schema"}]' at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:180) at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1622) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1595) at org.apache.pig.PigServer.registerQuery(PigServer.java:583) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67) at org.apache.pig.Main.run(Main.java:487) at org.apache.pig.Main.main(Main.java:108) Caused by: java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments '[{"schema_file": "/user/hshankar/schema1.schema"}]' at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:492) at org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:699) at org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:688) at org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:3956) at org.apache.pig.parser.LogicalPlanGenerator.store_clause(LogicalPlanGenerator.java:5450) at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1041) at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:638) at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:459) at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:357) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:171) ... 10 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:482) ... 19 more Caused by: java.io.IOException: Invalid parameter:schema_file at org.apache.pig.piggybank.storage.avro.AvroStorage.init(AvroStorage.java:601) at org.apache.pig.piggybank.storage.avro.AvroStorage.init(AvroStorage.java:518) at org.apache.pig.piggybank.storage.avro.AvroStorage.<init>(AvroStorage.java:433) ... 24 more =============================================================================== I am using pig version 0.9.1-SNAPSHOT On Tue, Dec 13, 2011 at 10:29 PM, Bill Graham <[EMAIL PROTECTED]> wrote: +
IGZ Nick 2011-12-13, 18:15
-
Re: Using AvroStorage()Bill Graham 2011-12-13, 18:51
You still need to map the Tuple fields to the avro schema fields. See the
unit test for an example, or section 4.C of the documentation. It reads the schema from a data file, but the same approach is used when using schema_file instead. https://cwiki.apache.org/confluence/display/PIG/AvroStorage On Tue, Dec 13, 2011 at 10:15 AM, IGZ Nick <[EMAIL PROTECTED]> wrote: > Hi Bill, > > I tried schema_file but I get this error: > > grunt> STORE A INTO '/user/hshankar/out1' USING > org.apache.pig.piggybank.storage.avro.AvroStorage ('{"schema_file": > "/user/hshankar/schema1.schema"}'); > 2011-12-13 18:06:00,879 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 1200: could not instantiate > 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments > '[{"schema_file": "/user/hshankar/schema1.schema"}]' > Details at logfile: /export/home/hshankar/pig_scripts/pig_1323798959597.log > > This is what the logfile contains: > > ===============================================================================> Pig Stack Trace > --------------- > ERROR 1200: could not instantiate > 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments > '[{"schema_file": "/user/hshankar/schema1.schema"}]' > > Failed to parse: could not instantiate > 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments > '[{"schema_file": "/user/hshankar/schema1.schema"}]' > at > org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:180) > at > org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1622) > at > org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1595) > at org.apache.pig.PigServer.registerQuery(PigServer.java:583) > at > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67) > at org.apache.pig.Main.run(Main.java:487) > at org.apache.pig.Main.main(Main.java:108) > Caused by: java.lang.RuntimeException: could not instantiate > 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments > '[{"schema_file": "/user/hshankar/schema1.schema"}]' > at > org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:492) > at > org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:699) > at > org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:688) > at > org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:3956) > at > org.apache.pig.parser.LogicalPlanGenerator.store_clause(LogicalPlanGenerator.java:5450) > at > org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1041) > at > org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:638) > at > org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:459) > at > org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:357) > at > org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:171) > ... 10 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:482) > ... 19 more +
Bill Graham 2011-12-13, 18:51
-
Re: Using AvroStorage()IGZ Nick 2011-12-13, 19:45
ah ok.. Isn't there anything that would take the elements in order as it
is? Because mapping each field would almost lead to the same coupling between the schema file and the pig script which I am trying to avoid On Wed, Dec 14, 2011 at 12:21 AM, Bill Graham <[EMAIL PROTECTED]> wrote: > You still need to map the Tuple fields to the avro schema fields. See the > unit test for an example, or section 4.C of the documentation. It reads the > schema from a data file, but the same approach is used when using > schema_file instead. > > https://cwiki.apache.org/confluence/display/PIG/AvroStorage > > > On Tue, Dec 13, 2011 at 10:15 AM, IGZ Nick <[EMAIL PROTECTED]> wrote: > >> Hi Bill, >> >> I tried schema_file but I get this error: >> >> grunt> STORE A INTO '/user/hshankar/out1' USING >> org.apache.pig.piggybank.storage.avro.AvroStorage ('{"schema_file": >> "/user/hshankar/schema1.schema"}'); >> 2011-12-13 18:06:00,879 [main] ERROR org.apache.pig.tools.grunt.Grunt - >> ERROR 1200: could not instantiate >> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments >> '[{"schema_file": "/user/hshankar/schema1.schema"}]' >> Details at logfile: >> /export/home/hshankar/pig_scripts/pig_1323798959597.log >> >> This is what the logfile contains: >> >> ===============================================================================>> Pig Stack Trace >> --------------- >> ERROR 1200: could not instantiate >> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments >> '[{"schema_file": "/user/hshankar/schema1.schema"}]' >> >> Failed to parse: could not instantiate >> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments >> '[{"schema_file": "/user/hshankar/schema1.schema"}]' >> at >> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:180) >> at >> org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1622) >> at >> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1595) >> at org.apache.pig.PigServer.registerQuery(PigServer.java:583) >> at >> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942) >> at >> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) >> at >> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) >> at >> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) >> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67) >> at org.apache.pig.Main.run(Main.java:487) >> at org.apache.pig.Main.main(Main.java:108) >> Caused by: java.lang.RuntimeException: could not instantiate >> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments >> '[{"schema_file": "/user/hshankar/schema1.schema"}]' >> at >> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:492) >> at >> org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:699) >> at >> org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:688) >> at >> org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:3956) >> at >> org.apache.pig.parser.LogicalPlanGenerator.store_clause(LogicalPlanGenerator.java:5450) >> at >> org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1041) >> at >> org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:638) >> at >> org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:459) >> at >> org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:357) >> at >> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:171) >> ... 10 more >> Caused by: java.lang.reflect.InvocationTargetException >> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >> Method) >> at >> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) +
IGZ Nick 2011-12-13, 19:45
-
Re: Using AvroStorage()Bill Graham 2011-12-15, 00:17
AFAIK, there is no way to implicitly map tuple fields to those loaded from
a schema file. On Tue, Dec 13, 2011 at 11:45 AM, IGZ Nick <[EMAIL PROTECTED]> wrote: > ah ok.. Isn't there anything that would take the elements in order as it > is? Because mapping each field would almost lead to the same coupling > between the schema file and the pig script which I am trying to avoid > > > On Wed, Dec 14, 2011 at 12:21 AM, Bill Graham <[EMAIL PROTECTED]>wrote: > >> You still need to map the Tuple fields to the avro schema fields. See the >> unit test for an example, or section 4.C of the documentation. It reads the >> schema from a data file, but the same approach is used when using >> schema_file instead. >> >> https://cwiki.apache.org/confluence/display/PIG/AvroStorage >> >> >> On Tue, Dec 13, 2011 at 10:15 AM, IGZ Nick <[EMAIL PROTECTED]> wrote: >> >>> Hi Bill, >>> >>> I tried schema_file but I get this error: >>> >>> grunt> STORE A INTO '/user/hshankar/out1' USING >>> org.apache.pig.piggybank.storage.avro.AvroStorage ('{"schema_file": >>> "/user/hshankar/schema1.schema"}'); >>> 2011-12-13 18:06:00,879 [main] ERROR org.apache.pig.tools.grunt.Grunt - >>> ERROR 1200: could not instantiate >>> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments >>> '[{"schema_file": "/user/hshankar/schema1.schema"}]' >>> Details at logfile: >>> /export/home/hshankar/pig_scripts/pig_1323798959597.log >>> >>> This is what the logfile contains: >>> >>> ===============================================================================>>> Pig Stack Trace >>> --------------- >>> ERROR 1200: could not instantiate >>> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments >>> '[{"schema_file": "/user/hshankar/schema1.schema"}]' >>> >>> Failed to parse: could not instantiate >>> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments >>> '[{"schema_file": "/user/hshankar/schema1.schema"}]' >>> at >>> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:180) >>> at >>> org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1622) >>> at >>> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1595) >>> at org.apache.pig.PigServer.registerQuery(PigServer.java:583) >>> at >>> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942) >>> at >>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) >>> at >>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) >>> at >>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) >>> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67) >>> at org.apache.pig.Main.run(Main.java:487) >>> at org.apache.pig.Main.main(Main.java:108) >>> Caused by: java.lang.RuntimeException: could not instantiate >>> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments >>> '[{"schema_file": "/user/hshankar/schema1.schema"}]' >>> at >>> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:492) >>> at >>> org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:699) >>> at >>> org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:688) >>> at >>> org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:3956) >>> at >>> org.apache.pig.parser.LogicalPlanGenerator.store_clause(LogicalPlanGenerator.java:5450) >>> at >>> org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1041) >>> at >>> org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:638) >>> at >>> org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:459) >>> at >>> org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:357) >>> at >>> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:171) +
Bill Graham 2011-12-15, 00:17
|