Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop >> mail # user >> import from Oracle to Hive : 2 errors


+
Jérôme Verdier 2013-06-17, 13:46
+
Jarek Jarcec Cecho 2013-06-17, 14:58
+
Jérôme Verdier 2013-06-17, 15:31
+
Jarek Jarcec Cecho 2013-06-17, 16:54
+
Jérôme Verdier 2013-06-17, 15:59
+
Jarek Jarcec Cecho 2013-06-17, 16:56
Copy link to this message
-
Re: import from Oracle to Hive : 2 errors
Hi Jarcec,

Thanks for your explanations, it help me understand how Sqoop works.

i'm trying import 1000 Rows for a quite Oracle big table which is divided
in partitions to keep reasonable query time.

i am using this Sqoop script, with a query to select only the first 1000
rows :

sqoop import --connect jdbc:oracle:thin:@xx.xx.xx.xx:1521/D_BI --username
xx --password xx --create-hive-table --query 'SELECT * FROM
DT_PILOTAGE.DEMARQUE_MAG_JOUR WHERE ROWNUM<1000 AND $CONDITIONS'
--target-dir /home/hduser --split-by DEMARQUE_MAG_JOUR.CO_SOCIETE
--hive-table default.DEMARQUE_MAG_JOUR

the M/R job is working quite, but as we can see in the result below, datas
are not moved to Hive.

Warning: /usr/lib/hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
13/06/18 12:05:21 WARN tool.BaseSqoopTool: Setting your password on the
command-line is insecure. Consider using -P instead.
13/06/18 12:05:21 WARN tool.BaseSqoopTool: It seems that you've specified
at least one of following:
13/06/18 12:05:21 WARN tool.BaseSqoopTool:      --hive-home
13/06/18 12:05:21 WARN tool.BaseSqoopTool:      --hive-overwrite
13/06/18 12:05:21 WARN tool.BaseSqoopTool:      --create-hive-table
13/06/18 12:05:21 WARN tool.BaseSqoopTool:      --hive-table
13/06/18 12:05:21 WARN tool.BaseSqoopTool:      --hive-partition-key
13/06/18 12:05:21 WARN tool.BaseSqoopTool:      --hive-partition-value
13/06/18 12:05:21 WARN tool.BaseSqoopTool:      --map-column-hive
13/06/18 12:05:21 WARN tool.BaseSqoopTool: Without specifying parameter
--hive-import. Please note that
13/06/18 12:05:21 WARN tool.BaseSqoopTool: those arguments will not be used
in this session. Either
13/06/18 12:05:21 WARN tool.BaseSqoopTool: specify --hive-import to apply
them correctly or remove them
13/06/18 12:05:21 WARN tool.BaseSqoopTool: from command line to remove this
warning.
13/06/18 12:05:21 INFO manager.SqlManager: Using default fetchSize of 1000
13/06/18 12:05:21 INFO tool.CodeGenTool: Beginning code generation
13/06/18 12:05:40 INFO manager.OracleManager: Time zone has been set to GMT
13/06/18 12:05:40 INFO manager.SqlManager: Executing SQL statement: SELECT
* FROM DT_PILOTAGE.DEMARQUE_MAG_JOUR WHERE ROWNUM<1000 AND  (1 = 0)
13/06/18 12:05:40 INFO manager.SqlManager: Executing SQL statement: SELECT
* FROM DT_PILOTAGE.DEMARQUE_MAG_JOUR WHERE ROWNUM<1000 AND  (1 = 0)
13/06/18 12:05:40 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
/usr/local/hadoop
Note:
/tmp/sqoop-hduser/compile/b2b0decece541a7abda95580d7b1f0d2/QueryResult.java
uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/06/18 12:05:41 INFO orm.CompilationManager: Writing jar file:
/tmp/sqoop-hduser/compile/b2b0decece541a7abda95580d7b1f0d2/QueryResult.jar
13/06/18 12:05:41 INFO mapreduce.ImportJobBase: Beginning query import.
13/06/18 12:05:42 INFO db.DataDrivenDBInputFormat: BoundingValsQuery:
SELECT MIN(t1.CO_SOCIETE), MAX(t1.CO_SOCIETE) FROM (SELECT * FROM
DT_PILOTAGE.DEMARQUE_MAG_JOUR WHERE ROWNUM<1000 AND  (1 = 1) ) t1
13/06/18 12:05:42 WARN db.BigDecimalSplitter: Set BigDecimal splitSize to
MIN_INCREMENT
13/06/18 12:05:42 INFO mapred.JobClient: Running job: job_201306180922_0005
13/06/18 12:05:43 INFO mapred.JobClient:  map 0% reduce 0%
13/06/18 12:05:50 INFO mapred.JobClient:  map 100% reduce 0%
13/06/18 12:05:51 INFO mapred.JobClient: Job complete: job_201306180922_0005
13/06/18 12:05:51 INFO mapred.JobClient: Counters: 18
13/06/18 12:05:51 INFO mapred.JobClient:   Job Counters
13/06/18 12:05:51 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6570
13/06/18 12:05:51 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
13/06/18 12:05:51 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
13/06/18 12:05:51 INFO mapred.JobClient:     Launched map tasks=1
13/06/18 12:05:51 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
13/06/18 12:05:51 INFO mapred.JobClient:   File Output Format Counters
13/06/18 12:05:52 INFO mapred.JobClient:     Bytes Written=174729
13/06/18 12:05:52 INFO mapred.JobClient:   FileSystemCounters
13/06/18 12:05:52 INFO mapred.JobClient:     HDFS_BYTES_READ=147
13/06/18 12:05:52 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=61242
13/06/18 12:05:52 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=174729
13/06/18 12:05:52 INFO mapred.JobClient:   File Input Format Counters
13/06/18 12:05:52 INFO mapred.JobClient:     Bytes Read=0
13/06/18 12:05:52 INFO mapred.JobClient:   Map-Reduce Framework
13/06/18 12:05:52 INFO mapred.JobClient:     Map input records=999
13/06/18 12:05:52 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=43872256
13/06/18 12:05:52 INFO mapred.JobClient:     Spilled Records=0
13/06/18 12:05:52 INFO mapred.JobClient:     CPU time spent (ms)=830
13/06/18 12:05:52 INFO mapred.JobClient:     Total committed heap usage
(bytes)=16252928
13/06/18 12:05:52 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=373719040
13/06/18 12:05:52 INFO mapred.JobClient:     Map output records=999
13/06/18 12:05:52 INFO mapred.JobClient:     SPLIT_RAW_BYTES=147
13/06/18 12:05:52 INFO mapreduce.ImportJobBase: Transferred 170,6338 KB in
10,6221 seconds (16,0641 KB/sec)
13/06/18 12:05:52 INFO mapreduce.ImportJobBase: Retrieved 999 records.

Why SQOOP doesn't move data to Hive, is there a problem with partitioned
table ??

Thanks.
2013/6/17 Jarek Jarcec Cecho <[EMAIL PROTECTED]>
*Jérôme VERDIER*
06.72.19.17.31
[EMAIL PROTECTED]
+
Venkat 2013-06-18, 13:38
+
Jérôme Verdier 2013-06-18, 14:02
+
Jérôme Verdier 2013-06-18, 14:14
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB