Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop >> mail # user >> Zero rows imported while doing Mysql to Hive import


+
Siddharth Karandikar 2013-07-03, 11:25
+
Jarek Jarcec Cecho 2013-07-03, 16:01
+
Siddharth Karandikar 2013-07-03, 11:41
+
Jarek Jarcec Cecho 2013-07-03, 16:06
Copy link to this message
-
Re: Zero rows imported while doing Mysql to Hive import
Hi Jarek,

I am have not re-configured Hive. I am using the default
settings/locations. I am using --hive-home to tell sqoop where to find
Hive.

Here are the locations of my sqoop, Hive and Hadoop instances.
Hadoop:    /root/siddharth/tools/hadoop-1.1.2
Hive:    /root/siddharth/tools/hive-0.11.0-bin
Sqoop:    /root/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0
And here are few more details after running it with verbose.

I am using following command to import into hive:
ssk01:~/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0 # ./bin/sqoop
import --connect jdbc:mysql://localhost/ClassicModels -table Customers
-m 1 --hive-home /root/siddharth/tools/hive-0.11.0-bin --hive-import
--verbose --mysql-delimiters

Verbose output of above command:
http://pastebin.com/TcYG8vkr

After running this command here is what I see in Hive and HDFS

HDFS
====ssk01:~/siddharth/tools/hadoop-1.1.2 # bin/hadoop fs -ls
hdfs://localhost:9000/user/hive/warehouse/*
Found 2 items
-rw-r--r--   1 root supergroup          0 2013-07-04 00:41
/user/hive/warehouse/customers/_SUCCESS
-rw-r--r--   1 root supergroup      15569 2013-07-04 00:41
/user/hive/warehouse/customers/part-m-00000
Hive (I am running Hive from its own directory so metadata should be accessible)
==========================================================ssk01:~/siddharth/tools/hive-0.11.0-bin # ./bin/hive

Logging initialized using configuration in
jar:file:/root/siddharth/tools/hive-0.11.0-bin/lib/hive-common-0.11.0.jar!/hive-log4j.properties
Hive history file=/tmp/root/[EMAIL PROTECTED]
hive> show databases;
OK
default
Time taken: 8.035 seconds, Fetched: 1 row(s)

hive> use default;
OK
Time taken: 0.018 seconds

hive> show tables;
OK
Time taken: 4.175 seconds
hive>

Strange thing is table named default.customers doesn't exist in Hive
even though sqoop output mentioned that.
Thanks,
Siddharth

On Wed, Jul 3, 2013 at 9:36 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]> wrote:
> Hi Siddharth,
> using directory in LOAD DATA command is completely valid. You can find more information about the command in Hive documentation [1]. I would estimate that your issue might be more with parsing the data rather than accessing them when you are able to see the rows, just with incorrect values.
>
> Jarcec
>
> Links:
> 1: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
>
> On Wed, Jul 03, 2013 at 05:11:47PM +0530, Siddharth Karandikar wrote:
>> Hi,
>>
>> While looking into Hive history file, I found this query.
>>
>> LOAD DATA INPATH 'hdfs://localhost:9000/user/root/Customers' INTO
>> TABLE `Customers`"
>> QUERY_ID="root_20130703050909_882c2484-e1c8-43a3-9eff-dd0f296fc560"
>> .....
>>
>> HDFS location mentioned in this query is a directory not a csv file.
>> This directory contains the part-* file(s) which hold actual data. I
>> don't know if Sqoop understands this directory structure and knows how
>> to read those multiple part-* files? Or is this an issue?
>>
>> I was hit by a similar thing while creating an external table in Hive
>> where location specified was such hdfs directory (generated by sqoop
>> import) containing multiple part-* files. Hive table got created but
>> all the rows were NULL. And thats why I started looking into
>> --hive-import option available in sqoop. But looks like it is also not
>> working for me.
>>
>> Am I missing something?
>>
>>
>> Thanks,
>> Siddharth
>>
>> On Wed, Jul 3, 2013 at 4:55 PM, Siddharth Karandikar
>> <[EMAIL PROTECTED]> wrote:
>> > Hi,
>> >
>> > I am facing some problems while importing a sample database from MySQL
>> > to Hive using Sqoop 1.4.3, Hive 0.11.0 and Hadoop 1.1.2 on a single
>> > node setup.
>> >
>> > While doing this, I am always seeing following message in job logs -
>> > Table default.customers stats: [num_partitions: 0, num_files: 2,
>> > num_rows: 0, total_size: 15556, raw_data_size: 0]
>> >
>> > Job ends with success message -
>> > 13/07/03 05:09:30 INFO hive.HiveImport: Time taken: 0.74 seconds
+
Siddharth Karandikar 2013-07-05, 13:47
+
Siddharth Karandikar 2013-07-05, 14:06
+
Han Sen Tey 2013-07-05, 14:23
+
Siddharth Karandikar 2013-07-05, 19:09
+
Jarek Jarcec Cecho 2013-07-08, 15:35
+
Jarek Jarcec Cecho 2013-07-08, 15:30
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB