Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # dev >> Review Request: SQOOP-931 - Integration of Sqoop and HCatalog


Copy link to this message
-
Re: Review Request: SQOOP-931 - Integration of Sqoop and HCatalog

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/#review21420
-----------------------------------------------------------
Hi Venkat,
Thank you for incorporating my comments, greatly appreciated. I've took a deep look again and I do have following additional comments:

1) Can we add the HCatalog tests into ThirdPartyTest suite? https://github.com/apache/sqoop/blob/trunk/src/test/com/cloudera/sqoop/ThirdPartyTests.java

2) It seems that using --create-hcatalog-table will create the table and exist Sqoop without doing the import:

[root@bousa-hcat ~]# sqoop import --connect jdbc:mysql://mysql.ent.cloudera.com/sqoop --username sqoop --password sqoop --table text --hcatalog-table text --create-hcatalog-table
13/06/04 15:44:39 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
13/06/04 15:44:39 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
13/06/04 15:44:39 INFO tool.CodeGenTool: Beginning code generation
13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1
13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1
13/06/04 15:44:39 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
13/06/04 15:44:39 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core.jar
Note: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/06/04 15:44:42 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.jar
13/06/04 15:44:42 WARN manager.MySQLManager: It looks like you are importing from mysql.
13/06/04 15:44:42 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
13/06/04 15:44:42 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
13/06/04 15:44:42 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
13/06/04 15:44:42 INFO mapreduce.ImportJobBase: Beginning import of text
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog for import job
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog specific details for job
13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: Hive home is not set. job may fail if needed jar files are not found correctly.  Please set HIVE_HOME in sqoop-env.sh or provide --hive-home option.  Setting HIVE_HOME  to /usr/lib/hive
13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: HCatalog home is not set. job may fail if needed jar files are not found correctly.  Please set HCAT_HOME in sqoop-env.sh or provide --hcatalog-home option.   Setting HCAT_HOME to /usr/lib/hcatalog
13/06/04 15:44:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column names projected : [id, txt]
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column name - type map :
        Names: [id, txt]
        Types : [4, 12]
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Creating HCatalog table default.text for import
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: HCatalog Create table statement:

create table default.text (
        id int,
        txt string)
stored as rcfile
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Executing HCatalog CLI in-process.
Hive history file=/tmp/root/hive_job_log_65f4f145-0b1e-4e09-8e40-b7edcfc15f83_2077084453.txt
OK
Time taken: 25.121 seconds
[root@bousa-hcat ~]#
src/docs/user/hcatalog.txt
<https://reviews.apache.org/r/10688/#comment44346>

    Can we add here information what will happen if the table already exists and this parameter is specified?

src/docs/user/hcatalog.txt
<https://reviews.apache.org/r/10688/#comment44347>

    This seem unnecessary, can we tweak the bash scripts to do this automatically if the hcat command is present?

src/java/org/apache/sqoop/mapreduce/ExportJobBase.java
<https://reviews.apache.org/r/10688/#comment44350>

    Nit: I think that this line can be also refactored to the parent class right?

src/java/org/apache/sqoop/mapreduce/ImportJobBase.java
<https://reviews.apache.org/r/10688/#comment44349>

    Nit: I think that this line can be also refactored to the parent class right?

src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java
<https://reviews.apache.org/r/10688/#comment44351>

    This method seems to be required only for the debug message. Is it the only reason or did I miss something?

src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44369>

    Nit: It seems that we are doing the options = opts; every in all cases so maybe it would be worth putting this line before "if" statement?

src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44371>

    Nit: Shouldn't be default Hive home in SqoopOptions.getDefaultHiveHome()?

src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44372>

    Shouldn't be default Hive home in SqoopOptions.getDefaultHcatHome()?

src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44397>

    Both Hive and HBase are idempotent when creating tables, so It might make sense to add "IF NOT EXISTS" in order to remain consistent.

src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44395>

    It seems that at this point we are not reading the hive configuration files but yet executing the in-process Hive CLI that will as a result not pick up the configuration file and will use defaults that is