I am have not re-configured Hive. I am using the default
settings/locations. I am using --hive-home to tell sqoop where to find
Here are the locations of my sqoop, Hive and Hadoop instances.
And here are few more details after running it with verbose.
I am using following command to import into hive:
ssk01:~/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0 # ./bin/sqoop
import --connect jdbc:mysql://localhost/ClassicModels -table Customers
-m 1 --hive-home /root/siddharth/tools/hive-0.11.0-bin --hive-import
Verbose output of above command:
After running this command here is what I see in Hive and HDFS
====ssk01:~/siddharth/tools/hadoop-1.1.2 # bin/hadoop fs -ls
Found 2 items
-rw-r--r-- 1 root supergroup 0 2013-07-04 00:41
-rw-r--r-- 1 root supergroup 15569 2013-07-04 00:41
Hive (I am running Hive from its own directory so metadata should be accessible)
==========================================================ssk01:~/siddharth/tools/hive-0.11.0-bin # ./bin/hive
Logging initialized using configuration in
Hive history file=/tmp/root/[EMAIL PROTECTED]
hive> show databases;
Time taken: 8.035 seconds, Fetched: 1 row(s)
hive> use default;
Time taken: 0.018 seconds
hive> show tables;
Time taken: 4.175 seconds
Strange thing is table named default.customers doesn't exist in Hive
even though sqoop output mentioned that.
On Wed, Jul 3, 2013 at 9:36 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]> wrote:
> Hi Siddharth,
> using directory in LOAD DATA command is completely valid. You can find more information about the command in Hive documentation . I would estimate that your issue might be more with parsing the data rather than accessing them when you are able to see the rows, just with incorrect values.
> 1: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
> On Wed, Jul 03, 2013 at 05:11:47PM +0530, Siddharth Karandikar wrote:
>> While looking into Hive history file, I found this query.
>> LOAD DATA INPATH 'hdfs://localhost:9000/user/root/Customers' INTO
>> TABLE `Customers`"
>> HDFS location mentioned in this query is a directory not a csv file.
>> This directory contains the part-* file(s) which hold actual data. I
>> don't know if Sqoop understands this directory structure and knows how
>> to read those multiple part-* files? Or is this an issue?
>> I was hit by a similar thing while creating an external table in Hive
>> where location specified was such hdfs directory (generated by sqoop
>> import) containing multiple part-* files. Hive table got created but
>> all the rows were NULL. And thats why I started looking into
>> --hive-import option available in sqoop. But looks like it is also not
>> working for me.
>> Am I missing something?
>> On Wed, Jul 3, 2013 at 4:55 PM, Siddharth Karandikar
>> <[EMAIL PROTECTED]> wrote:
>> > Hi,
>> > I am facing some problems while importing a sample database from MySQL
>> > to Hive using Sqoop 1.4.3, Hive 0.11.0 and Hadoop 1.1.2 on a single
>> > node setup.
>> > While doing this, I am always seeing following message in job logs -
>> > Table default.customers stats: [num_partitions: 0, num_files: 2,
>> > num_rows: 0, total_size: 15556, raw_data_size: 0]
>> > Job ends with success message -
>> > 13/07/03 05:09:30 INFO hive.HiveImport: Time taken: 0.74 seconds