Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Joins in Sqoop


Copy link to this message
-
Re: Joins in Sqoop
I thought so since you did not have space in MySql db.

You have to find a way to break the data into smaller buckets and then try
to get it to hive or like I suggested originally, bring the data into hive
and do what you have to do there,

The options for hive are,
1. Get only the joined data into hive and in hive concatenate the columns
and store into another table and then do a group by that concatenated field.
2. Get tables individually into hive, do the join and also concatenate the
fields and insert them into a new table and then run group by.
3.  Write a simple hive udf to do what the mysql function does after
bringing the join tables into hive
4. Bring the join tables into hive and write a mapreduce to do what you
want to do.

Hope you find a way to do it and post it here.

Thanks,
Chalcy
On Tue, Jan 21, 2014 at 11:47 AM, Sambit Tripathy (RBEI/PJ-NBS) <
[EMAIL PROTECTED]> wrote:

> I have tried it but no luck. Got the same error again. Errcode: 28 - No
> space left on device i.e MySQL server is running out of space. Looks like
> the temporary table created also takes up a lot of storage space.
>
> 1. I created a temp table
>
> sqoop eval  --connect jdbc:mysql://10.xx.xx.xx:3306/db_name --username
> user1 --password ******  --query "CREATE TEMPORARY TABLE IF NOT EXISTS
> LIFECYCLE_DAT_TEMP AS (SELECT * FROM VIEW_JOIN)"
>
> where VIEW_JOIN --> the view that contains the JOINs
>
> 2.  Then the import command
>
> sqoop import --connect jdbc:mysql://10.xx.xx.xx:3306/db_name --table
> LIFECYCLE_DAT_TEMP --username user1 --password ******* --split-by timestamp
>
>
> @ Jarcec: Is this what you have suggested?
>
> Cloudera is providing the Teradata Connector for Sqoop which does the same
> thing as default but could not find anything for MySQL.
>
>
>
> Regards,
> Sambit
>
> -----Original Message-----
> From: Sambit Tripathy (RBEI/PJ-NBS) [mailto:[EMAIL PROTECTED]]
> Sent: Saturday, January 18, 2014 12:30 AM
> To: [EMAIL PROTECTED]
> Subject: RE: Joins in Sqoop
>
> That sounds good and in the past I have used temp tables for some other
> stuff and they work. I will try it out and post my observations.
>
> -----Original Message-----
> From: Jarek Jarcec Cecho [mailto:[EMAIL PROTECTED]]
> Sent: Friday, January 17, 2014 10:25 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Joins in Sqoop
>
> Large joins are better performed on the database side and stored in
> temporary table (CREATE TABLE temp_tbl AS SELECT ...) that can be
> subsequently imported by Sqoop without creating large temp files.
>
> Jarcec
>
> On Thu, Jan 16, 2014 at 09:20:04PM +0800, Sambit Tripathy (RBEI/PJ-NBS)
> wrote:
> > Hi,
> >
> > I have written query which has 5 Join clauses and I am passing this
> query in Sqoop import.
> >
> > Problem: This produces a large temp file in the MySQL server temp
> directory and throws back an error saying No Space left on the device. Yes
> this can be fixed by increasing the size of the temp directory in the MySQL
> server, but what if you actually don't have any space left on MySQL server.
> Are there any workarounds for this? I mean something like a batch import
> which does not create a big temp file in the server.
> >
> >
> > Regards,
> > Sambit.
> >
>