Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop >> mail # user >> Need help and tips for tthe following issue: No data get exported from hadoop to mysql using sqoop.


+
Matthieu Labour 2012-10-10, 16:30
+
Jarek Jarcec Cecho 2012-10-10, 16:40
+
Matthieu Labour 2012-10-10, 17:16
+
Jarek Jarcec Cecho 2012-10-10, 17:27
Copy link to this message
-
Re: Need help and tips for tthe following issue: No data get exported from hadoop to mysql using sqoop.
Jarcek

I am quite new to hadoop and amazon EMR. Where are those files located?

Here is what I am doing:

1) I am using amazon elastic map reduce and I have created a New Job that
does not terminate and whose type is HBase

2) I get the job id
myaccount@ubuntu:~/elastic-mapreduce-cli$ ./elastic-mapreduce --list
--active
j-3EFP15LBJC8R4     RUNNING
ec2-XXX-XX-XXX-XX.compute-1.amazonaws.com        sqooping
   COMPLETED      Setup Hadoop Debugging
   COMPLETED      Start HBase
   COMPLETED      Setup Hive
   RUNNING        Setup Pig

3) I attach and run a step:
./elastic-mapreduce -j j-3EFP15LBJC8R4 --jar
s3://elasticmapreduce/libs/script-runner/script-runner.jar --arg
s3://mybucket/sqoop/sqoop.sh

4) I ssh the machine. ssh -i ~/.ec2/MYKEY.pem
[EMAIL PROTECTED]

5) tail -f /mnt/var/lib/hadoop/steps/6/stderr shows the mapreduce job
hanging
12/10/10 17:46:58 DEBUG mapreduce.ExportInputFormat: Generated splits:
12/10/10 17:46:58 DEBUG mapreduce.ExportInputFormat:
Paths:/mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01/part-m-00000:0+52
Locations:ip-10-77-70-192.ec2.internal:;
12/10/10 17:46:58 INFO mapred.JobClient: Running job: job_201210101503_0024
12/10/10 17:46:59 INFO mapred.JobClient:  map 0% reduce 0%

6) In /mnt/var/lib/hadoop/steps/6 there is the scoop.sh script file with
~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect
jdbc:mysql://hostname:3306/analyticsdb --username username --password
password --table ml_ys_log_gmt_test --export-dir
=hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01
--input-fields-termi
nated-by='\t' --lines-terminated-by='\n' --verbose --batch

On that same machine, same location ( /mnt/var/lib/hadoop/steps/6), the
following command works
mysql -h hostname -P 3306 -u username -p
password: password
Afterwards I can use the database, describe the table etc ....
Please note the mysql machine is running on Amazon RDS and I have
added ElasticMapReduce-master security group to RDS

Thank you for your help
On Wed, Oct 10, 2012 at 1:27 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote:

> It would be very helpful if you could send us task log from one map job
> that Sqoop executes.
>
> Blindly shooting - Sqoop is connecting to your database from map tasks.
> Based on the connection issues - are you sure that you can connect to your
> database from all nodes in your cluster?
>
> Jarcec
>
> On Wed, Oct 10, 2012 at 01:16:03PM -0400, Matthieu Labour wrote:
> > Hi Jerek
> >
> > Thank you so much for your help.
> >
> > Following your advice, I run the following command:
> > ~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect
> > jdbc:mysql://hostname:3306/analyticsdb --username username --password
> > password --table ml_ys_log_gmt_test --export-dir
> > hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01
> > --input-fields-terminated-by='\t' --lines-terminated-by='\n' --verbose
> >
> > It seems to find the file to export. So that is good. In the log I see
> the
> > following: (I am not sure why :0+52 gets appended)
> > 2/10/10 16:43:41 DEBUG mapreduce.ExportInputFormat:
> >
> Paths:/mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01/part-m-00000:0+52
> > Locations:ip-XX-XX-XX-XXX.ec2.internal:;
> >
> > However it hangs forever after it printed the following:
> > 12/10/10 16:43:42 INFO mapred.JobClient:  map 0% reduce 0%
> >
> > Then It seems the JDBC connection is eventually timing out.
> > 12/10/10 16:47:07 INFO mapred.JobClient: Task Id :
> > attempt_201210101503_0019_m_000000_0, Status : FAILED
> >
> > Here is the log towards the end:
> >
> > 12/10/10 16:43:40 INFO mapred.JobClient: Default number of map tasks: 4
> > 12/10/10 16:43:40 INFO mapred.JobClient: Default number of reduce tasks:
> 0
> > 12/10/10 16:43:41 INFO mapred.JobClient: Setting group to hadoop
> > 12/10/10 16:43:41 INFO input.FileInputFormat: Total input paths to
> process
> > : 1
> > 12/10/10 16:43:41 DEBUG mapreduce.ExportInputFormat: Target numMapTasks=4

Matthieu Labour, Engineering | *Action**X* |
584 Broadway, Suite 1002 – NY, NY 10012
415-994-3480 (m)
+
Matthieu Labour 2012-10-10, 21:22
+
Jarek Jarcec Cecho 2012-10-10, 23:58
+
Matthieu Labour 2012-10-11, 14:39
+
Jarek Jarcec Cecho 2012-10-11, 15:38
+
Jarek Jarcec Cecho 2012-10-15, 21:40
+
Matthieu Labour 2012-10-17, 21:53
+
Jarek Jarcec Cecho 2012-10-17, 21:58
+
Matthieu Labour 2012-10-17, 22:01
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB