|
|
-
How can I read Hive text files on S3 from Pig?
Martin Goodson 2012-10-12, 15:48
I am trying to load some text files in hive partitions on S3 using the AllLoader function with no success. I get an error which indicates that AllLoader is expecting the files to be on hdfs:
a = LOAD 's3n://xxxxx/yyyyy/zzz' using org.apache.pig.piggybank.storage.AllLoader(); grunt> 2012-10-12 14:51:26,229 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. Wrong FS: s3n://xxxxx/yyyyy/zzz, expected: hdfs:// namenode.hadoop.companyname.com reading the files with pig storage works fine but PigStorage is not aware of the Hive partition structure so I cannot query the data using this method (I have to specify the file manually):
a = LOAD 's3n://xxxxx/yyyyy/zzzZ' using PigStorage();
Is there a way of reading hive partitions from pig over S3?
hive-0.9.0 pig-0.10.0 hadoop-0.20 Thank you Martin
-
Re: How can I read Hive text files on S3 from Pig?
Dmitriy Ryaboy 2012-10-12, 17:56
Martin, Do you have the compete stack trace? Generally, for Hive interop I recommend HCatalog; AllLoader is neat but it's a 3rd party contrib and we don't really know it too well. I can check out the error dump and see if there's anything obvious though.
D
On Fri, Oct 12, 2012 at 8:48 AM, Martin Goodson <[EMAIL PROTECTED]> wrote: > I am trying to load some text files in hive partitions on S3 using the > AllLoader function with no success. I get an error which indicates that > AllLoader is expecting the files to be on hdfs: > > a = LOAD 's3n://xxxxx/yyyyy/zzz' using > org.apache.pig.piggybank.storage.AllLoader(); > grunt> 2012-10-12 14:51:26,229 [main] ERROR > org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. > Wrong FS: s3n://xxxxx/yyyyy/zzz, expected: hdfs:// > namenode.hadoop.companyname.com > > > reading the files with pig storage works fine but PigStorage is not aware > of the Hive partition structure so I cannot query the data using this > method (I have to specify the file manually): > > a = LOAD 's3n://xxxxx/yyyyy/zzzZ' using PigStorage(); > > Is there a way of reading hive partitions from pig over S3? > > hive-0.9.0 > pig-0.10.0 > hadoop-0.20 > > > Thank you > Martin
-
Re: How can I read Hive text files on S3 from Pig?
Martin Goodson 2012-10-13, 09:53
Hi Dmitriy, here's is the stack trace:
java.lang.IllegalArgumentException: Wrong FS: s3n://xxx/yyy/zz/, expected: hdfs://namenode.adsf.companyname.com at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:433) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:523) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:820) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147) at org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:316) at org.apache.pig.piggybank.storage.JsonMetadata.findMetaFile(JsonMetadata.java:94) at org.apache.pig.piggybank.storage.JsonMetadata.getSchema(JsonMetadata.java:154) at org.apache.pig.piggybank.storage.AllLoader.getSchema(AllLoader.java:400) at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150) at org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109) at org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100) at org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57) at org.apache.pig.PigServer$Graph.compile(PigServer.java:1679) at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582) at org.apache.pig.PigServer.registerQuery(PigServer.java:584) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) at org.apache.pig.Main.run(Main.java:495) at org.apache.pig.Main.main(Main.java:111) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:197) Thanks for taking a look. I will start looking into HCatalog too.
Martin On 12 October 2012 18:56, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> Martin, > Do you have the compete stack trace? > Generally, for Hive interop I recommend HCatalog; AllLoader is neat > but it's a 3rd party contrib and we don't really know it too well. I > can check out the error dump and see if there's anything obvious > though. > > D > > On Fri, Oct 12, 2012 at 8:48 AM, Martin Goodson > <[EMAIL PROTECTED]> wrote: > > I am trying to load some text files in hive partitions on S3 using the > > AllLoader function with no success. I get an error which indicates that > > AllLoader is expecting the files to be on hdfs: > > > > a = LOAD 's3n://xxxxx/yyyyy/zzz' using > > org.apache.pig.piggybank.storage.AllLoader(); > > grunt> 2012-10-12 14:51:26,229 [main] ERROR > > org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. > > Wrong FS: s3n://xxxxx/yyyyy/zzz, expected: hdfs:// > > namenode.hadoop.companyname.com
-
Re: How can I read Hive text files on S3 from Pig?
Dmitriy Ryaboy 2012-10-18, 04:15
Yeah that's a bug in FileLocalizer, apparently it assumes local or hdfs, only. Could you file a jira?
D
On Sat, Oct 13, 2012 at 2:53 AM, Martin Goodson <[EMAIL PROTECTED]> wrote: > Hi Dmitriy, > here's is the stack trace: > > java.lang.IllegalArgumentException: Wrong FS: s3n://xxx/yyy/zz/, expected: > hdfs://namenode.adsf.companyname.com > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:433) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:523) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:820) > at > org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203) > at > org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131) > at > org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147) > at > org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:316) > at > org.apache.pig.piggybank.storage.JsonMetadata.findMetaFile(JsonMetadata.java:94) > at > org.apache.pig.piggybank.storage.JsonMetadata.getSchema(JsonMetadata.java:154) > at > org.apache.pig.piggybank.storage.AllLoader.getSchema(AllLoader.java:400) > at > org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150) > at > org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109) > at > org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100) > at > org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) > at > org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57) > at org.apache.pig.PigServer$Graph.compile(PigServer.java:1679) > at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610) > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582) > at org.apache.pig.PigServer.registerQuery(PigServer.java:584) > at > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) > at org.apache.pig.Main.run(Main.java:495) > at org.apache.pig.Main.main(Main.java:111) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at org.apache.hadoop.util.RunJar.main(RunJar.java:197) > > > Thanks for taking a look. I will start looking into HCatalog too. > > Martin > > > On 12 October 2012 18:56, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > >> Martin, >> Do you have the compete stack trace? >> Generally, for Hive interop I recommend HCatalog; AllLoader is neat >> but it's a 3rd party contrib and we don't really know it too well. I >> can check out the error dump and see if there's anything obvious >> though. >> >> D >> >> On Fri, Oct 12, 2012 at 8:48 AM, Martin Goodson >> <[EMAIL PROTECTED]> wrote: >> > I am trying to load some text files in hive partitions on S3 using the >> > AllLoader function with no success. I get an error which indicates that >> > AllLoader is expecting the files to be on hdfs:
-
Re: How can I read Hive text files on S3 from Pig?
Martin Goodson 2012-10-18, 12:22
Sure - thanks for having a look. By the way, I've moved to HCatalog and things look they are working. Thanks again Martin
On 18 October 2012 05:15, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> Yeah that's a bug in FileLocalizer, apparently it assumes local or > hdfs, only. Could you file a jira? > > D > > On Sat, Oct 13, 2012 at 2:53 AM, Martin Goodson > <[EMAIL PROTECTED]> wrote: > > Hi Dmitriy, > > here's is the stack trace: > > > > java.lang.IllegalArgumentException: Wrong FS: s3n://xxx/yyy/zz/, > expected: > > hdfs://namenode.adsf.companyname.com > > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:433) > > at > > > org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129) > > at > > > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:523) > > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:820) > > at > > > org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203) > > at > > > org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131) > > at > > > org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147) > > at > > org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:316) > > at > > > org.apache.pig.piggybank.storage.JsonMetadata.findMetaFile(JsonMetadata.java:94) > > at > > > org.apache.pig.piggybank.storage.JsonMetadata.getSchema(JsonMetadata.java:154) > > at > > org.apache.pig.piggybank.storage.AllLoader.getSchema(AllLoader.java:400) > > at > > > org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150) > > at > > > org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109) > > at > > > org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100) > > at > > org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218) > > at > > > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) > > at > > > org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57) > > at org.apache.pig.PigServer$Graph.compile(PigServer.java:1679) > > at > org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610) > > at > org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582) > > at org.apache.pig.PigServer.registerQuery(PigServer.java:584) > > at > > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942) > > at > > > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) > > at > > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) > > at > > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) > > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) > > at org.apache.pig.Main.run(Main.java:495) > > at org.apache.pig.Main.main(Main.java:111) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:601) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:197) > > > > > > Thanks for taking a look. I will start looking into HCatalog too. > > > > Martin > > > > > > On 12 October 2012 18:56, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > > > >> Martin, > >> Do you have the compete stack trace? > >> Generally, for Hive interop I recommend HCatalog; AllLoader is neat > >> but it's a 3rd party contrib and we don't really know it too well. I
-
Re: How can I read Hive text files on S3 from Pig?
Dmitriy Ryaboy 2012-10-18, 20:59
The same underlying class is used by PigStorage in 11, so we should clean this up to make S3 users happy.
D
On Thu, Oct 18, 2012 at 5:22 AM, Martin Goodson <[EMAIL PROTECTED]> wrote: > Sure - thanks for having a look. By the way, I've moved to HCatalog and > things look they are working. > Thanks again > Martin > > On 18 October 2012 05:15, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > >> Yeah that's a bug in FileLocalizer, apparently it assumes local or >> hdfs, only. Could you file a jira? >> >> D >> >> On Sat, Oct 13, 2012 at 2:53 AM, Martin Goodson >> <[EMAIL PROTECTED]> wrote: >> > Hi Dmitriy, >> > here's is the stack trace: >> > >> > java.lang.IllegalArgumentException: Wrong FS: s3n://xxx/yyy/zz/, >> expected: >> > hdfs://namenode.adsf.companyname.com >> > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:433) >> > at >> > >> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129) >> > at >> > >> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:523) >> > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:820) >> > at >> > >> org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203) >> > at >> > >> org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131) >> > at >> > >> org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147) >> > at >> > org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:316) >> > at >> > >> org.apache.pig.piggybank.storage.JsonMetadata.findMetaFile(JsonMetadata.java:94) >> > at >> > >> org.apache.pig.piggybank.storage.JsonMetadata.getSchema(JsonMetadata.java:154) >> > at >> > org.apache.pig.piggybank.storage.AllLoader.getSchema(AllLoader.java:400) >> > at >> > >> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150) >> > at >> > >> org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109) >> > at >> > >> org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100) >> > at >> > org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218) >> > at >> > >> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) >> > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) >> > at >> > >> org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57) >> > at org.apache.pig.PigServer$Graph.compile(PigServer.java:1679) >> > at >> org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610) >> > at >> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582) >> > at org.apache.pig.PigServer.registerQuery(PigServer.java:584) >> > at >> > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942) >> > at >> > >> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) >> > at >> > >> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) >> > at >> > >> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) >> > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) >> > at org.apache.pig.Main.run(Main.java:495) >> > at org.apache.pig.Main.main(Main.java:111) >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> > at >> > >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> > at >> > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> > at java.lang.reflect.Method.invoke(Method.java:601) >> > at org.apache.hadoop.util.RunJar.main(RunJar.java:197) >> > >> > >> > Thanks for taking a look. I will start looking into HCatalog too.
|
|