Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # dev >> Review Request: SQOOP-931 - Integration of Sqoop and HCatalog


Copy link to this message
-
Re: Review Request: SQOOP-931 - Integration of Sqoop and HCatalog


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > Hi Venkat,
> > Thank you for incorporating my comments, greatly appreciated. I've took a deep look again and I do have following additional comments:
> >
> > 1) Can we add the HCatalog tests into ThirdPartyTest suite? https://github.com/apache/sqoop/blob/trunk/src/test/com/cloudera/sqoop/ThirdPartyTests.java
> >
> > 2) It seems that using --create-hcatalog-table will create the table and exist Sqoop without doing the import:
> >
> > [root@bousa-hcat ~]# sqoop import --connect jdbc:mysql://mysql.ent.cloudera.com/sqoop --username sqoop --password sqoop --table text --hcatalog-table text --create-hcatalog-table
> > 13/06/04 15:44:39 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
> > 13/06/04 15:44:39 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
> > 13/06/04 15:44:39 INFO tool.CodeGenTool: Beginning code generation
> > 13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1
> > 13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1
> > 13/06/04 15:44:39 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
> > 13/06/04 15:44:39 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core.jar
> > Note: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.java uses or overrides a deprecated API.
> > Note: Recompile with -Xlint:deprecation for details.
> > 13/06/04 15:44:42 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.jar
> > 13/06/04 15:44:42 WARN manager.MySQLManager: It looks like you are importing from mysql.
> > 13/06/04 15:44:42 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
> > 13/06/04 15:44:42 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
> > 13/06/04 15:44:42 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
> > 13/06/04 15:44:42 INFO mapreduce.ImportJobBase: Beginning import of text
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog for import job
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog specific details for job
> > 13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: Hive home is not set. job may fail if needed jar files are not found correctly.  Please set HIVE_HOME in sqoop-env.sh or provide --hive-home option.  Setting HIVE_HOME  to /usr/lib/hive
> > 13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: HCatalog home is not set. job may fail if needed jar files are not found correctly.  Please set HCAT_HOME in sqoop-env.sh or provide --hcatalog-home option.   Setting HCAT_HOME to /usr/lib/hcatalog
> > 13/06/04 15:44:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column names projected : [id, txt]
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column name - type map :
> >         Names: [id, txt]
> >         Types : [4, 12]
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Creating HCatalog table default.text for import
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: HCatalog Create table statement:
> >
> > create table default.text (
> >         id int,
> >         txt string)
> > stored as rcfile
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Executing HCatalog CLI in-process.
> > Hive history file=/tmp/root/hive_job_log_65f4f145-0b1e-4e09-8e40-b7edcfc15f83_2077084453.txt
> > OK
> > Time taken: 25.121 seconds
> > [root@bousa-hcat ~]#
> >
> >

Sure, I can add it to that.

--create-hcatalog-table -  It seems to work by chance - That is, after creating the table a bunch of stuff is done that is not needed.   I will add additional checks there
> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:

Good point.  Since I modified  the hive unit tests to function correctly in the presence of real hive environment, this can be easily done.

Yes.   One thing to note is that by  moving the isHCatJob to the parent class we lost the ability to mark it as final.   Let me rework it

Please see above

Yes, it is needed for debugging purpose when we want to know when the sub record reader or main record reader are called

Sure

Yes.   The message needs fixing

Yes.  As above

Good point.  I think we will otherwise earlier, but for consistency I think we should do this.   Will change

 Hive and hcat configuration files and jars have to be in the classpath brought in by hcat -classpath.   Do you think that is not always in the configuration?   When I update the configure sqoop script, I will make sure the hive conf is added.

Yes.  WIll fix

Good catch - earlier I had the ability to execute a command line but removed it in favor of a simpler model.  Will remove it

Sure will change

Sure.   Will do.

Sure will do

Sure will do
- Venkat
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/#review21420
On June 3, 2013, 4:16 a.m., Venkat Ranganathan wrote: