Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop >> mail # user >> Zero rows imported while doing Mysql to Hive import


+
Siddharth Karandikar 2013-07-03, 11:25
+
Jarek Jarcec Cecho 2013-07-03, 16:01
+
Siddharth Karandikar 2013-07-03, 11:41
+
Jarek Jarcec Cecho 2013-07-03, 16:06
Copy link to this message
-
Re: Zero rows imported while doing Mysql to Hive import
Hi Jarek,

I am have not re-configured Hive. I am using the default
settings/locations. I am using --hive-home to tell sqoop where to find
Hive.

Here are the locations of my sqoop, Hive and Hadoop instances.
Hadoop:    /root/siddharth/tools/hadoop-1.1.2
Hive:    /root/siddharth/tools/hive-0.11.0-bin
Sqoop:    /root/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0
And here are few more details after running it with verbose.

I am using following command to import into hive:
ssk01:~/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0 # ./bin/sqoop
import --connect jdbc:mysql://localhost/ClassicModels -table Customers
-m 1 --hive-home /root/siddharth/tools/hive-0.11.0-bin --hive-import
--verbose --mysql-delimiters

Verbose output of above command:
http://pastebin.com/TcYG8vkr

After running this command here is what I see in Hive and HDFS

HDFS
====ssk01:~/siddharth/tools/hadoop-1.1.2 # bin/hadoop fs -ls
hdfs://localhost:9000/user/hive/warehouse/*
Found 2 items
-rw-r--r--   1 root supergroup          0 2013-07-04 00:41
/user/hive/warehouse/customers/_SUCCESS
-rw-r--r--   1 root supergroup      15569 2013-07-04 00:41
/user/hive/warehouse/customers/part-m-00000
Hive (I am running Hive from its own directory so metadata should be accessible)
==========================================================ssk01:~/siddharth/tools/hive-0.11.0-bin # ./bin/hive

Logging initialized using configuration in
jar:file:/root/siddharth/tools/hive-0.11.0-bin/lib/hive-common-0.11.0.jar!/hive-log4j.properties
Hive history file=/tmp/root/[EMAIL PROTECTED]
hive> show databases;
OK
default
Time taken: 8.035 seconds, Fetched: 1 row(s)

hive> use default;
OK
Time taken: 0.018 seconds

hive> show tables;
OK
Time taken: 4.175 seconds
hive>

Strange thing is table named default.customers doesn't exist in Hive
even though sqoop output mentioned that.
Thanks,
Siddharth

On Wed, Jul 3, 2013 at 9:36 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]> wrote:
> Hi Siddharth,
> using directory in LOAD DATA command is completely valid. You can find more information about the command in Hive documentation [1]. I would estimate that your issue might be more with parsing the data rather than accessing them when you are able to see the rows, just with incorrect values.
>
> Jarcec
>
> Links:
> 1: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
>
> On Wed, Jul 03, 2013 at 05:11:47PM +0530, Siddharth Karandikar wrote:
>> Hi,
>>
>> While looking into Hive history file, I found this query.
>>
>> LOAD DATA INPATH 'hdfs://localhost:9000/user/root/Customers' INTO
>> TABLE `Customers`"
>> QUERY_ID="root_20130703050909_882c2484-e1c8-43a3-9eff-dd0f296fc560"
>> .....
>>
>> HDFS location mentioned in this query is a directory not a csv file.
>> This directory contains the part-* file(s) which hold actual data. I
>> don't know if Sqoop understands this directory structure and knows how
>> to read those multiple part-* files? Or is this an issue?
>>
>> I was hit by a similar thing while creating an external table in Hive
>> where location specified was such hdfs directory (generated by sqoop
>> import) containing multiple part-* files. Hive table got created but
>> all the rows were NULL. And thats why I started looking into
>> --hive-import option available in sqoop. But looks like it is also not
>> working for me.
>>
>> Am I missing something?
>>
>>
>> Thanks,
>> Siddharth
>>
>> On Wed, Jul 3, 2013 at 4:55 PM, Siddharth Karandikar
>> <[EMAIL PROTECTED]> wrote:
>> > Hi,
>> >
>> > I am facing some problems while importing a sample database from MySQL
>> > to Hive using Sqoop 1.4.3, Hive 0.11.0 and Hadoop 1.1.2 on a single
>> > node setup.
>> >
>> > While doing this, I am always seeing following message in job logs -
>> > Table default.customers stats: [num_partitions: 0, num_files: 2,
>> > num_rows: 0, total_size: 15556, raw_data_size: 0]
>> >
>> > Job ends with success message -
>> > 13/07/03 05:09:30 INFO hive.HiveImport: Time taken: 0.74 seconds
+
Siddharth Karandikar 2013-07-05, 13:47
+
Siddharth Karandikar 2013-07-05, 14:06
+
Han Sen Tey 2013-07-05, 14:23
+
Siddharth Karandikar 2013-07-05, 19:09
+
Jarek Jarcec Cecho 2013-07-08, 15:35
+
Jarek Jarcec Cecho 2013-07-08, 15:30