Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Joins in Sqoop

Copy link to this message
RE: Joins in Sqoop
I tried  Option 2 and things are working fine. Thank you all for your valuable suggestions.

Meanwhile, I came across group_concat() in Impala which does exactly the same what you can with MySql. So in case you have Impala configured in your cluster, you can try this
1.      Import all the tables to Hive using Sqoop

2.      Make proper delimeter adjustment during import

3.      Run the query in Impala shell


From: Chalcy [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, January 21, 2014 10:37 PM
Subject: Re: Joins in Sqoop

I thought so since you did not have space in MySql db.

You have to find a way to break the data into smaller buckets and then try to get it to hive or like I suggested originally, bring the data into hive and do what you have to do there,

The options for hive are,
1. Get only the joined data into hive and in hive concatenate the columns and store into another table and then do a group by that concatenated field.
2. Get tables individually into hive, do the join and also concatenate the fields and insert them into a new table and then run group by.
3.  Write a simple hive udf to do what the mysql function does after bringing the join tables into hive
4. Bring the join tables into hive and write a mapreduce to do what you want to do.

Hope you find a way to do it and post it here.


On Tue, Jan 21, 2014 at 11:47 AM, Sambit Tripathy (RBEI/PJ-NBS) <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
I have tried it but no luck. Got the same error again. Errcode: 28 - No space left on device i.e MySQL server is running out of space. Looks like the temporary table created also takes up a lot of storage space.

1. I created a temp table

sqoop eval  --connect jdbc:mysql://10.xx.xx.xx:3306/db_name --username user1 --password ******  --query "CREATE TEMPORARY TABLE IF NOT EXISTS LIFECYCLE_DAT_TEMP AS (SELECT * FROM VIEW_JOIN)"

where VIEW_JOIN --> the view that contains the JOINs

2.  Then the import command

sqoop import --connect jdbc:mysql://10.xx.xx.xx:3306/db_name --table LIFECYCLE_DAT_TEMP --username user1 --password ******* --split-by timestamp
@ Jarcec: Is this what you have suggested?

Cloudera is providing the Teradata Connector for Sqoop which does the same thing as default but could not find anything for MySQL.


-----Original Message-----
From: Sambit Tripathy (RBEI/PJ-NBS) [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Saturday, January 18, 2014 12:30 AM
Subject: RE: Joins in Sqoop

That sounds good and in the past I have used temp tables for some other stuff and they work. I will try it out and post my observations.

-----Original Message-----
From: Jarek Jarcec Cecho [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Friday, January 17, 2014 10:25 PM
Subject: Re: Joins in Sqoop

Large joins are better performed on the database side and stored in temporary table (CREATE TABLE temp_tbl AS SELECT ...) that can be subsequently imported by Sqoop without creating large temp files.


On Thu, Jan 16, 2014 at 09:20:04PM +0800, Sambit Tripathy (RBEI/PJ-NBS) wrote:
> Hi,
> I have written query which has 5 Join clauses and I am passing this query in Sqoop import.
> Problem: This produces a large temp file in the MySQL server temp directory and throws back an error saying No Space left on the device. Yes this can be fixed by increasing the size of the temp directory in the MySQL server, but what if you actually don't have any space left on MySQL server. Are there any workarounds for this? I mean something like a batch import which does not create a big temp file in the server.
> Regards,
> Sambit.