Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop >> mail # user >> Sqoop and S3


+
Jurgen Van Gael 2013-02-11, 21:01
+
Jarek Jarcec Cecho 2013-02-12, 16:12
Copy link to this message
-
Re: Sqoop and S3
Sure thing. Just added it as SQOOP-891 [1], let me know if there are things I need to change.

Jurgen

Links:
1: https://issues.apache.org/jira/browse/SQOOP-891

On Tuesday, 12 February 2013 at 16:12, Jarek Jarcec Cecho wrote:

> Hi Jurgen,
> I believe that your investigation is going very good direction, the MAPREDUCE-1806 in deed do not seem to be ported into Sqoop. Would you mind opening a JIRA for that [1]?
>
> Jarcec
>
> Links:
> 1: https://issues.apache.org/jira/browse/SQOOP
>
> On Mon, Feb 11, 2013 at 09:01:01PM +0000, Jurgen Van Gael wrote:
> > Hi,
> >
> > I recently tried to use sqoop to export a Hive table that lives on S3 into my MySQL server (sqoop export --options-file config.txt --table _universe --export-dir s3n://key:secret@mybucket/universe --input-fields-terminated-by '\0001' -m 1 --input-null-string '\\N' --input-null-non-string '\\N'^C). My Sqoop runs on a CDH4 cluster on EC2. I was getting errors such as the following:
> >
> > 13/02/11 17:37:15 ERROR security.UserGroupInformation: PriviledgedActionException as:XXX (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: /universe/000000_0.snappy
> > 13/02/11 17:37:15 ERROR tool.ExportTool: Encountered IOException running export job: java.io.FileNotFoundException: File does not exist: /universe/000000_0.snappy
> >
> > Since the files do exist on S3, I was reminded of getting the same errors when running Hive queries against this table. The reason Hive was failing back then is because of a bug in CombineFileInputFormat when using it against a non-default file system. These issues have since been fixed in Hadoop:
> > https://issues.apache.org/jira/browse/MAPREDUCE-1806
> > https://issues.apache.org/jira/browse/MAPREDUCE-2704
> >
> >
> > I believe Sqoop uses a version of CombineFileInputFormat but as far as I can tell from the latest sources on GIT hasn't incorporated the above fixes. My questions for the user group:
> > Am I completely off in my investigations?
> > Is there something I am missing in configuring Sqoop for exporting from S3?
> > Is there a way for me to bypass the CombineFileInputFormat so I can make my exports work?
> >
> > Many thanks,
> >
> > Jurgen

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB