Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop, mail # user - Need help and tips for tthe following issue: No data get exported from hadoop to mysql using sqoop.


Copy link to this message
-
Re: Need help and tips for tthe following issue: No data get exported from hadoop to mysql using sqoop.
Matthieu Labour 2012-10-10, 18:06
Jarcek

I am quite new to hadoop and amazon EMR. Where are those files located?

Here is what I am doing:

1) I am using amazon elastic map reduce and I have created a New Job that
does not terminate and whose type is HBase

2) I get the job id
myaccount@ubuntu:~/elastic-mapreduce-cli$ ./elastic-mapreduce --list
--active
j-3EFP15LBJC8R4     RUNNING
ec2-XXX-XX-XXX-XX.compute-1.amazonaws.com        sqooping
   COMPLETED      Setup Hadoop Debugging
   COMPLETED      Start HBase
   COMPLETED      Setup Hive
   RUNNING        Setup Pig

3) I attach and run a step:
./elastic-mapreduce -j j-3EFP15LBJC8R4 --jar
s3://elasticmapreduce/libs/script-runner/script-runner.jar --arg
s3://mybucket/sqoop/sqoop.sh

4) I ssh the machine. ssh -i ~/.ec2/MYKEY.pem
[EMAIL PROTECTED]

5) tail -f /mnt/var/lib/hadoop/steps/6/stderr shows the mapreduce job
hanging
12/10/10 17:46:58 DEBUG mapreduce.ExportInputFormat: Generated splits:
12/10/10 17:46:58 DEBUG mapreduce.ExportInputFormat:
Paths:/mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01/part-m-00000:0+52
Locations:ip-10-77-70-192.ec2.internal:;
12/10/10 17:46:58 INFO mapred.JobClient: Running job: job_201210101503_0024
12/10/10 17:46:59 INFO mapred.JobClient:  map 0% reduce 0%

6) In /mnt/var/lib/hadoop/steps/6 there is the scoop.sh script file with
~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect
jdbc:mysql://hostname:3306/analyticsdb --username username --password
password --table ml_ys_log_gmt_test --export-dir
=hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01
--input-fields-termi
nated-by='\t' --lines-terminated-by='\n' --verbose --batch

On that same machine, same location ( /mnt/var/lib/hadoop/steps/6), the
following command works
mysql -h hostname -P 3306 -u username -p
password: password
Afterwards I can use the database, describe the table etc ....
Please note the mysql machine is running on Amazon RDS and I have
added ElasticMapReduce-master security group to RDS

Thank you for your help
On Wed, Oct 10, 2012 at 1:27 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote:

> It would be very helpful if you could send us task log from one map job
> that Sqoop executes.
>
> Blindly shooting - Sqoop is connecting to your database from map tasks.
> Based on the connection issues - are you sure that you can connect to your
> database from all nodes in your cluster?
>
> Jarcec
>
> On Wed, Oct 10, 2012 at 01:16:03PM -0400, Matthieu Labour wrote:
> > Hi Jerek
> >
> > Thank you so much for your help.
> >
> > Following your advice, I run the following command:
> > ~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect
> > jdbc:mysql://hostname:3306/analyticsdb --username username --password
> > password --table ml_ys_log_gmt_test --export-dir
> > hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01
> > --input-fields-terminated-by='\t' --lines-terminated-by='\n' --verbose
> >
> > It seems to find the file to export. So that is good. In the log I see
> the
> > following: (I am not sure why :0+52 gets appended)
> > 2/10/10 16:43:41 DEBUG mapreduce.ExportInputFormat:
> >
> Paths:/mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01/part-m-00000:0+52
> > Locations:ip-XX-XX-XX-XXX.ec2.internal:;
> >
> > However it hangs forever after it printed the following:
> > 12/10/10 16:43:42 INFO mapred.JobClient:  map 0% reduce 0%
> >
> > Then It seems the JDBC connection is eventually timing out.
> > 12/10/10 16:47:07 INFO mapred.JobClient: Task Id :
> > attempt_201210101503_0019_m_000000_0, Status : FAILED
> >
> > Here is the log towards the end:
> >
> > 12/10/10 16:43:40 INFO mapred.JobClient: Default number of map tasks: 4
> > 12/10/10 16:43:40 INFO mapred.JobClient: Default number of reduce tasks:
> 0
> > 12/10/10 16:43:41 INFO mapred.JobClient: Setting group to hadoop
> > 12/10/10 16:43:41 INFO input.FileInputFormat: Total input paths to
> process
> > : 1
> > 12/10/10 16:43:41 DEBUG mapreduce.ExportInputFormat: Target numMapTasks=4

Matthieu Labour, Engineering | *Action**X* |
584 Broadway, Suite 1002 – NY, NY 10012
415-994-3480 (m)