Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - Import from MySQL to Hive using Sqoop


+
Omkar Joshi 2013-06-27, 04:13
Copy link to this message
-
Re: Import from MySQL to Hive using Sqoop
Nitin Pawar 2013-06-27, 06:17
Disclaimer: I am not a sqoop guru so here are just suggestions,

sqoop documentation says,
"Sqoop job to import data for Hive into a particular partition by
specifying the --hive-partition-key and --hive-partition-value arguments"

I have not tried these, but not sure will it works in case of dynamic
partitioning.

Also, not sure have you looked at incremental imports so that you do not
have to import old data again and again.

Can you put the same question across sqoop user group?

To answer your questions:

For (2), I already have given the options to use above

For (3), As long as you are just importing one date's data and your
partition key is that date column, you can write into a directory something
like hdfs://blah/datastore/table/partitioncolumn=value/
you can register that partition with hive with one more step,

This approach is what option 2 implements  where it imports data into a
single partition for a given value.

On Thu, Jun 27, 2013 at 9:43 AM, Omkar Joshi <[EMAIL PROTECTED]>wrote:

>  Hi,****
>
> ** **
>
> I have to import > 400 million rows from a MySQL table(having a composite
> primary key) into a PARTITIONED Hive table Hive via Sqoop. The table has
> data for two years with a column departure date ranging from 20120605 to
> 20140605 and thousands of records for one day. I need to partition the data
> based on the departure date.****
>
> ** **
>
> The versions :****
>
> ** **
>
> Apache Hadoop  -           1.0.4****
>
> Apache Hive      -           0.9.0****
>
> Apache Sqoop    -           sqoop-1.4.2.bin__hadoop-1.0.0****
>
> ** **
>
> As per my knowledge, there are 3 approaches:****
>
> **1.    **MySQL -> Non-partitioned Hive table -> INSERT from
> Non-partitioned Hive table into Partitioned Hive table****
>
> The current painful one that I’m following****
>
> **2.    **MySQL -> Partitioned Hive table****
>
> I read that the support for this is added in later(?) versions of Hive and
> Sqoop but was unable to find an example****
>
> **3.    **MySQL -> Non-partitioned Hive table -> ALTER Non-partitioned
> Hive table to add PARTITION****
>
> The syntax dictates to specify partitions as key value pairs – not
> feasible in case of millions of records where one cannot think of all the
> partition key-value pairs****
>
> ** **
>
> Can anyone provide inputs for approaches 2 and 3?****
>
> ** **
>
> Regards,****
>
> Omkar Joshi****
>
> ** **
>
> ------------------------------
> The contents of this e-mail and any attachment(s) may contain confidential
> or privileged information for the intended recipient(s). Unintended
> recipients are prohibited from taking action on the basis of information in
> this e-mail and using or disseminating the information, and must notify the
> sender and delete it from their system. L&T Infotech will not accept
> responsibility or liability for the accuracy or completeness of, or the
> presence of any virus or disabling code in this e-mail"
>

--
Nitin Pawar
+
Omkar Joshi 2013-06-27, 06:25
+
tofunmibabatunde@... 2013-06-27, 06:51