|
Matthieu Labour
2012-10-10, 16:30
Jarek Jarcec Cecho
2012-10-10, 16:40
Matthieu Labour
2012-10-10, 17:16
Jarek Jarcec Cecho
2012-10-10, 17:27
Matthieu Labour
2012-10-10, 18:06
Matthieu Labour
2012-10-10, 21:22
Jarek Jarcec Cecho
2012-10-10, 23:58
Matthieu Labour
2012-10-11, 14:39
Jarek Jarcec Cecho
2012-10-11, 15:38
Jarek Jarcec Cecho
2012-10-15, 21:40
Matthieu Labour
2012-10-17, 21:53
Jarek Jarcec Cecho
2012-10-17, 21:58
Matthieu Labour
2012-10-17, 22:01
|
-
Need help and tips for tthe following issue: No data get exported from hadoop to mysql using sqoop.Matthieu Labour 2012-10-10, 16:30
Hi
I want to do the following: Export data stored in hadoop to MySql. It is not working and I have been pulling my hair. I was hoping to get a bit of help. Thank you in advance The command is the following: ~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect jdbc:mysql://hostname:3306/analyticsdb --username username --password password --table ml_ys_log_gmt_test --export-dir hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test --input-fields-terminated-by='\t' --lines-terminated-by='\n' --verbose On my mysqlserver in the database analyticsdb, I do have the following table ml_ys_log_gmt_test mysql> describe ml_ys_log_gmt_test; +--------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------+-------------+------+-----+---------+-------+ | mydate | varchar(32) | YES | | NULL | | | mydata | varchar(32) | YES | | NULL | | +--------+-------------+------+-----+---------+-------+ I can see the logs in hdfs hadoop@ip-XX-XX-XX-XX:/mnt/var/lib/hadoop/steps/5$ hadoop dfs -ls hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test Found 2 items drwxr-xr-x - hadoop supergroup 0 2012-10-10 15:23 /mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01 drwxr-xr-x - hadoop supergroup 0 2012-10-10 15:23 /mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-02 and if i tail one of the file I see the correct data hadoop@ip-XX-XX-XX-XX:/mnt/var/lib/hadoop/steps/5$ hadoop dfs -tail -f hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01/part-m-00000 20121001230101 blablabla1 20121001230202 blablabla2 Here is the trace when I run the command. Please note that no data get transferred. I would appreciate any tips. Thanks a lot! Warning: /usr/lib/hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. 12/10/10 16:25:25 DEBUG tool.BaseSqoopTool: Enabled debug logging. 12/10/10 16:25:25 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 12/10/10 16:25:25 DEBUG sqoop.ConnFactory: Loaded manager factory: com.cloudera.sqoop.manager.DefaultManagerFactory 12/10/10 16:25:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory: com.cloudera.sqoop.manager.DefaultManagerFactory 12/10/10 16:25:25 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:mysql: 12/10/10 16:25:25 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 12/10/10 16:25:25 DEBUG sqoop.ConnFactory: Instantiated ConnManager org.apache.sqoop.manager.MySQLManager@5ef4f44a 12/10/10 16:25:25 INFO tool.CodeGenTool: Beginning code generation 12/10/10 16:25:25 DEBUG manager.SqlManager: No connection paramenters specified. Using regular API for making connection. 12/10/10 16:25:26 DEBUG manager.SqlManager: Using fetchSize for next query: -2147483648 12/10/10 16:25:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `ml_ys_log_gmt_test` AS t LIMIT 1 12/10/10 16:25:26 DEBUG orm.ClassWriter: selected columns: 12/10/10 16:25:26 DEBUG orm.ClassWriter: mydate 12/10/10 16:25:26 DEBUG orm.ClassWriter: mydata 12/10/10 16:25:26 DEBUG manager.SqlManager: Using fetchSize for next query: -2147483648 12/10/10 16:25:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `ml_ys_log_gmt_test` AS t LIMIT 1 12/10/10 16:25:26 DEBUG orm.ClassWriter: Writing source file: /tmp/sqoop-hadoop/compile/7f5cd67c0aa5dbf20256f72b30ae922b/ml_ys_log_gmt_test.java 12/10/10 16:25:26 DEBUG orm.ClassWriter: Table name: ml_ys_log_gmt_test 12/10/10 16:25:26 DEBUG orm.ClassWriter: Columns: mydate:12, mydata:12, 12/10/10 16:25:26 DEBUG orm.ClassWriter: sourceFilename is ml_ys_log_gmt_test.java 12/10/10 16:25:26 DEBUG orm.CompilationManager: Found existing /tmp/sqoop-hadoop/compile/7f5cd67c0aa5dbf20256f72b30ae922b/ 12/10/10 16:25:26 INFO orm.CompilationManager: HADOOP_HOME is /home/hadoop 12/10/10 16:25:26 INFO orm.CompilationManager: Found hadoop core jar at: /home/hadoop/hadoop-core.jar 12/10/10 16:25:26 DEBUG orm.CompilationManager: Adding source file: /tmp/sqoop-hadoop/compile/7f5cd67c0aa5dbf20256f72b30ae922b/ml_ys_log_gmt_test.java 12/10/10 16:25:26 DEBUG orm.CompilationManager: Invoking javac with args: 12/10/10 16:25:26 DEBUG orm.CompilationManager: -sourcepath 12/10/10 16:25:26 DEBUG orm.CompilationManager: /tmp/sqoop-hadoop/compile/7f5cd67c0aa5dbf20256f72b30ae922b/ 12/10/10 16:25:26 DEBUG orm.CompilationManager: -d 12/10/10 16:25:26 DEBUG orm.CompilationManager: /tmp/sqoop-hadoop/compile/7f5cd67c0aa5dbf20256f72b30ae922b/ 12/10/10 16:25:26 DEBUG orm.CompilationManager: -classpath 12/10/10 16:25:26 DEBUG orm.CompilationManager: /home/hadoop/conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/home/hadoop:/home/hadoop/hadoop-core-1.0.3.jar:/home/hadoop/lib/activation-1.1.jar:/home/hadoop/lib/annotations.jar:/home/hadoop/lib/ant-1.8.1.jar:/home/hadoop/lib/ant-launcher-1.8.1.jar:/home/hadoop/lib/ant-nodeps-1.8.1.jar:/home/hadoop/lib/apache-jar-resource-bundle-1.4.jar:/home/hadoop/lib/asm-3.1.jar:/home/hadoop/lib/avro-1.5.3.jar:/home/hadoop/lib/avro-compiler-1.5.3.jar:/home/hadoop/lib/avro-ipc-1.5.3.jar:/home/hadoop/lib/avro-maven-plugin-1.5.3.jar:/home/hadoop/lib/aws-java-sdk-1.3.2.jar:/home/hadoop/lib/build-helper-maven-plugin-1.5.jar:/home/hadoop/lib/commons-beanutils-1.7.0.jar:/home/hadoop/lib/commons-beanutils-core-1.8.0.jar:/home/hadoop/lib/commons-cli-1.2.jar:/home/hadoop/lib/commons-codec-1.5.jar:/home/hadoop/lib/commons-collections-3.2.1.jar:/home/hadoop/lib/commons-configuration-1.6.jar:/home/hadoop/lib/commons-daemon-1.0.1.jar:/home/hadoop/lib/commons-digester-1.8.jar:/home/hadoop/lib/commons-el-1.0.jar:/home/hadoop/lib/commons-httpclient-3.1.jar:/home/hadoop/lib/commons-io-2.4.jar:/home/hadoop/lib/commons-lang-2.5.jar:/home/hadoop/lib/commons-logging-1.1.1.jar:/home/hadoop/lib/commons-logging-adapters-1.1.1.jar:/home/hadoop/lib/commons-logging-api-1.1.1.jar:/home/hadoop/lib/commons-math-2.1.jar:/ho +
Matthieu Labour 2012-10-10, 16:30
-
Re: Need help and tips for tthe following issue: No data get exported from hadoop to mysql using sqoop.Jarek Jarcec Cecho 2012-10-10, 16:40
Hi sir,
as far as I remember FileInputFormat is not doing recursive descent into subdirectories when looking for input files. Would you mind trying to export directory /mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01 to see if it will help? Something like sqoop export ... --export-dir /mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01 ... Jarcec On Wed, Oct 10, 2012 at 12:30:56PM -0400, Matthieu Labour wrote: > Hi > > I want to do the following: Export data stored in hadoop to MySql. It is > not working and I have been pulling my hair. I was hoping to get a bit of > help. Thank you in advance > > The command is the following: > > ~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect > jdbc:mysql://hostname:3306/analyticsdb --username username --password > password --table ml_ys_log_gmt_test --export-dir > hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test > --input-fields-terminated-by='\t' --lines-terminated-by='\n' --verbose > > On my mysqlserver in the database analyticsdb, I do have the following > table ml_ys_log_gmt_test > > mysql> describe ml_ys_log_gmt_test; > +--------+-------------+------+-----+---------+-------+ > | Field | Type | Null | Key | Default | Extra | > +--------+-------------+------+-----+---------+-------+ > | mydate | varchar(32) | YES | | NULL | | > | mydata | varchar(32) | YES | | NULL | | > +--------+-------------+------+-----+---------+-------+ > > I can see the logs in hdfs > > hadoop@ip-XX-XX-XX-XX:/mnt/var/lib/hadoop/steps/5$ hadoop dfs -ls > hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test > Found 2 items > drwxr-xr-x - hadoop supergroup 0 2012-10-10 15:23 > /mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01 > drwxr-xr-x - hadoop supergroup 0 2012-10-10 15:23 > /mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-02 > > and if i tail one of the file I see the correct data > > hadoop@ip-XX-XX-XX-XX:/mnt/var/lib/hadoop/steps/5$ hadoop dfs -tail -f > hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01/part-m-00000 > 20121001230101 blablabla1 > 20121001230202 blablabla2 > > > Here is the trace when I run the command. Please note that no data get > transferred. I would appreciate any tips. Thanks a lot! > > Warning: /usr/lib/hbase does not exist! HBase imports will fail. > Please set $HBASE_HOME to the root of your HBase installation. > 12/10/10 16:25:25 DEBUG tool.BaseSqoopTool: Enabled debug logging. > 12/10/10 16:25:25 WARN tool.BaseSqoopTool: Setting your password on the > command-line is insecure. Consider using -P instead. > 12/10/10 16:25:25 DEBUG sqoop.ConnFactory: Loaded manager factory: > com.cloudera.sqoop.manager.DefaultManagerFactory > 12/10/10 16:25:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory: > com.cloudera.sqoop.manager.DefaultManagerFactory > 12/10/10 16:25:25 DEBUG manager.DefaultManagerFactory: Trying with scheme: > jdbc:mysql: > 12/10/10 16:25:25 INFO manager.MySQLManager: Preparing to use a MySQL > streaming resultset. > 12/10/10 16:25:25 DEBUG sqoop.ConnFactory: Instantiated ConnManager > org.apache.sqoop.manager.MySQLManager@5ef4f44a > 12/10/10 16:25:25 INFO tool.CodeGenTool: Beginning code generation > 12/10/10 16:25:25 DEBUG manager.SqlManager: No connection paramenters > specified. Using regular API for making connection. > 12/10/10 16:25:26 DEBUG manager.SqlManager: Using fetchSize for next query: > -2147483648 > 12/10/10 16:25:26 INFO manager.SqlManager: Executing SQL statement: SELECT > t.* FROM `ml_ys_log_gmt_test` AS t LIMIT 1 > 12/10/10 16:25:26 DEBUG orm.ClassWriter: selected columns: > 12/10/10 16:25:26 DEBUG orm.ClassWriter: mydate > 12/10/10 16:25:26 DEBUG orm.ClassWriter: mydata > 12/10/10 16:25:26 DEBUG manager.SqlManager: Using fetchSize for next query: > -2147483648 > 12/10/10 16:25:26 INFO manager.SqlManager: Executing SQL statement: SELECT > t.* FROM `ml_ys_log_gmt_test` AS t LIMIT 1 > 12/10/10 16:25:26 DEBUG orm.ClassWriter: Writing source file: +
Jarek Jarcec Cecho 2012-10-10, 16:40
-
Re: Need help and tips for tthe following issue: No data get exported from hadoop to mysql using sqoop.Matthieu Labour 2012-10-10, 17:16
Hi Jerek
Thank you so much for your help. Following your advice, I run the following command: ~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect jdbc:mysql://hostname:3306/analyticsdb --username username --password password --table ml_ys_log_gmt_test --export-dir hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01 --input-fields-terminated-by='\t' --lines-terminated-by='\n' --verbose It seems to find the file to export. So that is good. In the log I see the following: (I am not sure why :0+52 gets appended) 2/10/10 16:43:41 DEBUG mapreduce.ExportInputFormat: Paths:/mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01/part-m-00000:0+52 Locations:ip-XX-XX-XX-XXX.ec2.internal:; However it hangs forever after it printed the following: 12/10/10 16:43:42 INFO mapred.JobClient: map 0% reduce 0% Then It seems the JDBC connection is eventually timing out. 12/10/10 16:47:07 INFO mapred.JobClient: Task Id : attempt_201210101503_0019_m_000000_0, Status : FAILED Here is the log towards the end: 12/10/10 16:43:40 INFO mapred.JobClient: Default number of map tasks: 4 12/10/10 16:43:40 INFO mapred.JobClient: Default number of reduce tasks: 0 12/10/10 16:43:41 INFO mapred.JobClient: Setting group to hadoop 12/10/10 16:43:41 INFO input.FileInputFormat: Total input paths to process : 1 12/10/10 16:43:41 DEBUG mapreduce.ExportInputFormat: Target numMapTasks=4 12/10/10 16:43:41 DEBUG mapreduce.ExportInputFormat: Total input bytes=52 12/10/10 16:43:41 DEBUG mapreduce.ExportInputFormat: maxSplitSize=13 12/10/10 16:43:41 INFO input.FileInputFormat: Total input paths to process : 1 12/10/10 16:43:41 DEBUG mapreduce.ExportInputFormat: Generated splits: 12/10/10 16:43:41 DEBUG mapreduce.ExportInputFormat: Paths:/mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01/part-m-00000:0+52 Locations:ip-XX-XX-XX-XXX.ec2.internal:; 12/10/10 16:43:41 INFO mapred.JobClient: Running job: job_201210101503_0019 12/10/10 16:43:42 INFO mapred.JobClient: map 0% reduce 0% 12/10/10 16:47:07 INFO mapred.JobClient: Task Id : attempt_201210101503_0019_m_000000_0, Status : FAILED java.io.IOException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. at org.apache.sqoop.mapreduce.ExportOutputFormat.getRecordWriter(ExportOutputFormat.java:79) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:635) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:760) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure On Wed, Oct 10, 2012 at 12:40 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > Hi sir, > as far as I remember FileInputFormat is not doing recursive descent into > subdirectories when looking for input files. Would you mind trying to > export directory /mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01 > to see if it will help? Something like > > sqoop export ... --export-dir > /mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01 ... > > Jarcec > > On Wed, Oct 10, 2012 at 12:30:56PM -0400, Matthieu Labour wrote: > > Hi > > > > I want to do the following: Export data stored in hadoop to MySql. It is > > not working and I have been pulling my hair. I was hoping to get a bit of > > help. Thank you in advance > > > > The command is the following: > > > > ~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect > > jdbc:mysql://hostname:3306/analyticsdb --username username --password Matthieu Labour, Engineering | *Action**X* | 584 Broadway, Suite 1002 – NY, NY 10012 415-994-3480 (m) +
Matthieu Labour 2012-10-10, 17:16
-
Re: Need help and tips for tthe following issue: No data get exported from hadoop to mysql using sqoop.Jarek Jarcec Cecho 2012-10-10, 17:27
It would be very helpful if you could send us task log from one map job that Sqoop executes.
Blindly shooting - Sqoop is connecting to your database from map tasks. Based on the connection issues - are you sure that you can connect to your database from all nodes in your cluster? Jarcec On Wed, Oct 10, 2012 at 01:16:03PM -0400, Matthieu Labour wrote: > Hi Jerek > > Thank you so much for your help. > > Following your advice, I run the following command: > ~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect > jdbc:mysql://hostname:3306/analyticsdb --username username --password > password --table ml_ys_log_gmt_test --export-dir > hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01 > --input-fields-terminated-by='\t' --lines-terminated-by='\n' --verbose > > It seems to find the file to export. So that is good. In the log I see the > following: (I am not sure why :0+52 gets appended) > 2/10/10 16:43:41 DEBUG mapreduce.ExportInputFormat: > Paths:/mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01/part-m-00000:0+52 > Locations:ip-XX-XX-XX-XXX.ec2.internal:; > > However it hangs forever after it printed the following: > 12/10/10 16:43:42 INFO mapred.JobClient: map 0% reduce 0% > > Then It seems the JDBC connection is eventually timing out. > 12/10/10 16:47:07 INFO mapred.JobClient: Task Id : > attempt_201210101503_0019_m_000000_0, Status : FAILED > > Here is the log towards the end: > > 12/10/10 16:43:40 INFO mapred.JobClient: Default number of map tasks: 4 > 12/10/10 16:43:40 INFO mapred.JobClient: Default number of reduce tasks: 0 > 12/10/10 16:43:41 INFO mapred.JobClient: Setting group to hadoop > 12/10/10 16:43:41 INFO input.FileInputFormat: Total input paths to process > : 1 > 12/10/10 16:43:41 DEBUG mapreduce.ExportInputFormat: Target numMapTasks=4 > 12/10/10 16:43:41 DEBUG mapreduce.ExportInputFormat: Total input bytes=52 > 12/10/10 16:43:41 DEBUG mapreduce.ExportInputFormat: maxSplitSize=13 > 12/10/10 16:43:41 INFO input.FileInputFormat: Total input paths to process > : 1 > 12/10/10 16:43:41 DEBUG mapreduce.ExportInputFormat: Generated splits: > 12/10/10 16:43:41 DEBUG mapreduce.ExportInputFormat: > Paths:/mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01/part-m-00000:0+52 > Locations:ip-XX-XX-XX-XXX.ec2.internal:; > 12/10/10 16:43:41 INFO mapred.JobClient: Running job: job_201210101503_0019 > 12/10/10 16:43:42 INFO mapred.JobClient: map 0% reduce 0% > 12/10/10 16:47:07 INFO mapred.JobClient: Task Id : > attempt_201210101503_0019_m_000000_0, Status : FAILED > java.io.IOException: > com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications > link failure > > The last packet sent successfully to the server was 0 milliseconds ago. The > driver has not received any packets from the server. > at > org.apache.sqoop.mapreduce.ExportOutputFormat.getRecordWriter(ExportOutputFormat.java:79) > at > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:635) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:760) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: > Communications link failure > > > > > On Wed, Oct 10, 2012 at 12:40 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > > > Hi sir, > > as far as I remember FileInputFormat is not doing recursive descent into > > subdirectories when looking for input files. Would you mind trying to > > export directory /mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01 > > to see if it will help? Something like > > > > sqoop export ... --export-dir +
Jarek Jarcec Cecho 2012-10-10, 17:27
-
Re: Need help and tips for tthe following issue: No data get exported from hadoop to mysql using sqoop.Matthieu Labour 2012-10-10, 18:06
Jarcek
I am quite new to hadoop and amazon EMR. Where are those files located? Here is what I am doing: 1) I am using amazon elastic map reduce and I have created a New Job that does not terminate and whose type is HBase 2) I get the job id myaccount@ubuntu:~/elastic-mapreduce-cli$ ./elastic-mapreduce --list --active j-3EFP15LBJC8R4 RUNNING ec2-XXX-XX-XXX-XX.compute-1.amazonaws.com sqooping COMPLETED Setup Hadoop Debugging COMPLETED Start HBase COMPLETED Setup Hive RUNNING Setup Pig 3) I attach and run a step: ./elastic-mapreduce -j j-3EFP15LBJC8R4 --jar s3://elasticmapreduce/libs/script-runner/script-runner.jar --arg s3://mybucket/sqoop/sqoop.sh 4) I ssh the machine. ssh -i ~/.ec2/MYKEY.pem [EMAIL PROTECTED] 5) tail -f /mnt/var/lib/hadoop/steps/6/stderr shows the mapreduce job hanging 12/10/10 17:46:58 DEBUG mapreduce.ExportInputFormat: Generated splits: 12/10/10 17:46:58 DEBUG mapreduce.ExportInputFormat: Paths:/mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01/part-m-00000:0+52 Locations:ip-10-77-70-192.ec2.internal:; 12/10/10 17:46:58 INFO mapred.JobClient: Running job: job_201210101503_0024 12/10/10 17:46:59 INFO mapred.JobClient: map 0% reduce 0% 6) In /mnt/var/lib/hadoop/steps/6 there is the scoop.sh script file with ~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect jdbc:mysql://hostname:3306/analyticsdb --username username --password password --table ml_ys_log_gmt_test --export-dir =hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01 --input-fields-termi nated-by='\t' --lines-terminated-by='\n' --verbose --batch On that same machine, same location ( /mnt/var/lib/hadoop/steps/6), the following command works mysql -h hostname -P 3306 -u username -p password: password Afterwards I can use the database, describe the table etc .... Please note the mysql machine is running on Amazon RDS and I have added ElasticMapReduce-master security group to RDS Thank you for your help On Wed, Oct 10, 2012 at 1:27 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > It would be very helpful if you could send us task log from one map job > that Sqoop executes. > > Blindly shooting - Sqoop is connecting to your database from map tasks. > Based on the connection issues - are you sure that you can connect to your > database from all nodes in your cluster? > > Jarcec > > On Wed, Oct 10, 2012 at 01:16:03PM -0400, Matthieu Labour wrote: > > Hi Jerek > > > > Thank you so much for your help. > > > > Following your advice, I run the following command: > > ~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect > > jdbc:mysql://hostname:3306/analyticsdb --username username --password > > password --table ml_ys_log_gmt_test --export-dir > > hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01 > > --input-fields-terminated-by='\t' --lines-terminated-by='\n' --verbose > > > > It seems to find the file to export. So that is good. In the log I see > the > > following: (I am not sure why :0+52 gets appended) > > 2/10/10 16:43:41 DEBUG mapreduce.ExportInputFormat: > > > Paths:/mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01/part-m-00000:0+52 > > Locations:ip-XX-XX-XX-XXX.ec2.internal:; > > > > However it hangs forever after it printed the following: > > 12/10/10 16:43:42 INFO mapred.JobClient: map 0% reduce 0% > > > > Then It seems the JDBC connection is eventually timing out. > > 12/10/10 16:47:07 INFO mapred.JobClient: Task Id : > > attempt_201210101503_0019_m_000000_0, Status : FAILED > > > > Here is the log towards the end: > > > > 12/10/10 16:43:40 INFO mapred.JobClient: Default number of map tasks: 4 > > 12/10/10 16:43:40 INFO mapred.JobClient: Default number of reduce tasks: > 0 > > 12/10/10 16:43:41 INFO mapred.JobClient: Setting group to hadoop > > 12/10/10 16:43:41 INFO input.FileInputFormat: Total input paths to > process > > : 1 > > 12/10/10 16:43:41 DEBUG mapreduce.ExportInputFormat: Target numMapTasks=4 Matthieu Labour, Engineering | *Action**X* | 584 Broadway, Suite 1002 – NY, NY 10012 415-994-3480 (m) +
Matthieu Labour 2012-10-10, 18:06
-
Re: Need help and tips for tthe following issue: No data get exported from hadoop to mysql using sqoop.Matthieu Labour 2012-10-10, 21:22
Hi Jarcek
If i use the postgresql jdbc connector and connect to one of our heroku machine then scoop works ~/$SQOOP_ROOT/bin/sqoop export --connect jdbc:postgresql://ec2-XX-XX-XXX-XX.compute-1.amazonaws.com:database --username username --password password --table ml_ys_log_gmt_test --export-dir -export-dir =hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01 --input-fields-terminated-by='\t' --lines-terminated-by='\n' --verbose --batch On Wed, Oct 10, 2012 at 2:06 PM, Matthieu Labour <[EMAIL PROTECTED]>wrote: > > Jarcek > > I am quite new to hadoop and amazon EMR. Where are those files located? > > Here is what I am doing: > > 1) I am using amazon elastic map reduce and I have created a New Job that > does not terminate and whose type is HBase > > 2) I get the job id > myaccount@ubuntu:~/elastic-mapreduce-cli$ ./elastic-mapreduce --list > --active > j-3EFP15LBJC8R4 RUNNING > ec2-XXX-XX-XXX-XX.compute-1.amazonaws.com sqooping > COMPLETED Setup Hadoop Debugging > COMPLETED Start HBase > COMPLETED Setup Hive > RUNNING Setup Pig > > 3) I attach and run a step: > ./elastic-mapreduce -j j-3EFP15LBJC8R4 --jar > s3://elasticmapreduce/libs/script-runner/script-runner.jar --arg > s3://mybucket/sqoop/sqoop.sh > > 4) I ssh the machine. ssh -i ~/.ec2/MYKEY.pem > [EMAIL PROTECTED] > > 5) tail -f /mnt/var/lib/hadoop/steps/6/stderr shows the mapreduce job > hanging > 12/10/10 17:46:58 DEBUG mapreduce.ExportInputFormat: Generated splits: > 12/10/10 17:46:58 DEBUG mapreduce.ExportInputFormat: > Paths:/mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01/part-m-00000:0+52 > Locations:ip-10-77-70-192.ec2.internal:; > 12/10/10 17:46:58 INFO mapred.JobClient: Running job: job_201210101503_0024 > 12/10/10 17:46:59 INFO mapred.JobClient: map 0% reduce 0% > > 6) In /mnt/var/lib/hadoop/steps/6 there is the scoop.sh script file with > ~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect > jdbc:mysql://hostname:3306/analyticsdb --username username --password > password --table ml_ys_log_gmt_test --export-dir > =hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01 > --input-fields-termi > nated-by='\t' --lines-terminated-by='\n' --verbose --batch > > On that same machine, same location ( /mnt/var/lib/hadoop/steps/6), the > following command works > mysql -h hostname -P 3306 -u username -p > password: password > Afterwards I can use the database, describe the table etc .... > Please note the mysql machine is running on Amazon RDS and I have > added ElasticMapReduce-master security group to RDS > > Thank you for your help > > > On Wed, Oct 10, 2012 at 1:27 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > >> It would be very helpful if you could send us task log from one map job >> that Sqoop executes. >> >> Blindly shooting - Sqoop is connecting to your database from map tasks. >> Based on the connection issues - are you sure that you can connect to your >> database from all nodes in your cluster? >> >> Jarcec >> >> On Wed, Oct 10, 2012 at 01:16:03PM -0400, Matthieu Labour wrote: >> > Hi Jerek >> > >> > Thank you so much for your help. >> > >> > Following your advice, I run the following command: >> > ~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect >> > jdbc:mysql://hostname:3306/analyticsdb --username username --password >> > password --table ml_ys_log_gmt_test --export-dir >> > hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01 >> > --input-fields-terminated-by='\t' --lines-terminated-by='\n' --verbose >> > >> > It seems to find the file to export. So that is good. In the log I see >> the >> > following: (I am not sure why :0+52 gets appended) >> > 2/10/10 16:43:41 DEBUG mapreduce.ExportInputFormat: >> > >> Paths:/mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01/part-m-00000:0+52 >> > Locations:ip-XX-XX-XX-XXX.ec2.internal:; >> > >> > However it hangs forever after it printed the following: >> > 12/10/10 16:43:42 INFO mapred.JobClient: map 0% reduce 0% Matthieu Labour, Engineering | *Action**X* | 584 Broadway, Suite 1002 – NY, NY 10012 415-994-3480 (m) +
Matthieu Labour 2012-10-10, 21:22
-
Re: Need help and tips for tthe following issue: No data get exported from hadoop to mysql using sqoop.Jarek Jarcec Cecho 2012-10-10, 23:58
Hi sir,
I have actually zero experience with amazon services, so I'm afraid that I can't much help you navigate to the map tasks logs. Usually on normal hadoop cluster, there is service call "Job Tracker" that is serving as central place for mapreduce jobs. I'm expecting that you should be able to find this webservice or something similar somehow somewhere. You should see job executed by hadoop there and you also should be able to get to individual task logs. Following my previous blind shoot - How is defined MySQL user that you're using for Sqoop? I'm very interested to know the host part of the user. For example usually there are users like root@localhost or jarcec@'%'. If your host part (in my examples it's localhost or '%') is restrictive enough your hadoop nodes might not be capable of connecting to that MySQL box and thus resulting in connection failures. Jarcec On Wed, Oct 10, 2012 at 05:22:14PM -0400, Matthieu Labour wrote: > Hi Jarcek > If i use the postgresql jdbc connector and connect to one of our heroku > machine then scoop works > ~/$SQOOP_ROOT/bin/sqoop export --connect > jdbc:postgresql://ec2-XX-XX-XXX-XX.compute-1.amazonaws.com:database > --username username --password password --table ml_ys_log_gmt_test > --export-dir -export-dir > =hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01 > --input-fields-terminated-by='\t' > --lines-terminated-by='\n' --verbose --batch > > On Wed, Oct 10, 2012 at 2:06 PM, Matthieu Labour <[EMAIL PROTECTED]>wrote: > > > > > Jarcek > > > > I am quite new to hadoop and amazon EMR. Where are those files located? > > > > Here is what I am doing: > > > > 1) I am using amazon elastic map reduce and I have created a New Job that > > does not terminate and whose type is HBase > > > > 2) I get the job id > > myaccount@ubuntu:~/elastic-mapreduce-cli$ ./elastic-mapreduce --list > > --active > > j-3EFP15LBJC8R4 RUNNING > > ec2-XXX-XX-XXX-XX.compute-1.amazonaws.com sqooping > > COMPLETED Setup Hadoop Debugging > > COMPLETED Start HBase > > COMPLETED Setup Hive > > RUNNING Setup Pig > > > > 3) I attach and run a step: > > ./elastic-mapreduce -j j-3EFP15LBJC8R4 --jar > > s3://elasticmapreduce/libs/script-runner/script-runner.jar --arg > > s3://mybucket/sqoop/sqoop.sh > > > > 4) I ssh the machine. ssh -i ~/.ec2/MYKEY.pem > > [EMAIL PROTECTED] > > > > 5) tail -f /mnt/var/lib/hadoop/steps/6/stderr shows the mapreduce job > > hanging > > 12/10/10 17:46:58 DEBUG mapreduce.ExportInputFormat: Generated splits: > > 12/10/10 17:46:58 DEBUG mapreduce.ExportInputFormat: > > Paths:/mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01/part-m-00000:0+52 > > Locations:ip-10-77-70-192.ec2.internal:; > > 12/10/10 17:46:58 INFO mapred.JobClient: Running job: job_201210101503_0024 > > 12/10/10 17:46:59 INFO mapred.JobClient: map 0% reduce 0% > > > > 6) In /mnt/var/lib/hadoop/steps/6 there is the scoop.sh script file with > > ~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect > > jdbc:mysql://hostname:3306/analyticsdb --username username --password > > password --table ml_ys_log_gmt_test --export-dir > > =hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01 > > --input-fields-termi > > nated-by='\t' --lines-terminated-by='\n' --verbose --batch > > > > On that same machine, same location ( /mnt/var/lib/hadoop/steps/6), the > > following command works > > mysql -h hostname -P 3306 -u username -p > > password: password > > Afterwards I can use the database, describe the table etc .... > > Please note the mysql machine is running on Amazon RDS and I have > > added ElasticMapReduce-master security group to RDS > > > > Thank you for your help > > > > > > On Wed, Oct 10, 2012 at 1:27 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > > > >> It would be very helpful if you could send us task log from one map job > >> that Sqoop executes. > >> > >> Blindly shooting - Sqoop is connecting to your database from map tasks. +
Jarek Jarcec Cecho 2012-10-10, 23:58
-
Re: Need help and tips for tthe following issue: No data get exported from hadoop to mysql using sqoop.Matthieu Labour 2012-10-11, 14:39
Jarceq
Thank you for your reply I have a hard time to believe that this is a jdbc connection issue because when i execute the sqoop export command, it succesfully executes Executing SQL statement: SELECT t.* FROM `ml_ys_log_gmt_test` AS t LIMIT 1 and if i cange the password in the sqoop export command then I get java.sql.SQLException: Access denied for user So sqoop export seems to be able to reach the Sql machine with that username and password I will use the postgresql for now as it works for me! Thank you for your help On Wed, Oct 10, 2012 at 7:58 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > Hi sir, > I have actually zero experience with amazon services, so I'm afraid that I > can't much help you navigate to the map tasks logs. Usually on normal > hadoop cluster, there is service call "Job Tracker" that is serving as > central place for mapreduce jobs. I'm expecting that you should be able to > find this webservice or something similar somehow somewhere. You should see > job executed by hadoop there and you also should be able to get to > individual task logs. > > Following my previous blind shoot - How is defined MySQL user that you're > using for Sqoop? I'm very interested to know the host part of the user. For > example usually there are users like root@localhost or jarcec@'%'. If > your host part (in my examples it's localhost or '%') is restrictive enough > your hadoop nodes might not be capable of connecting to that MySQL box and > thus resulting in connection failures. > > Jarcec > > On Wed, Oct 10, 2012 at 05:22:14PM -0400, Matthieu Labour wrote: > > Hi Jarcek > > If i use the postgresql jdbc connector and connect to one of our heroku > > machine then scoop works > > ~/$SQOOP_ROOT/bin/sqoop export --connect > > jdbc:postgresql://ec2-XX-XX-XXX-XX.compute-1.amazonaws.com:database > > --username username --password password --table ml_ys_log_gmt_test > > --export-dir -export-dir > > =hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01 > > --input-fields-terminated-by='\t' > > --lines-terminated-by='\n' --verbose --batch > > > > On Wed, Oct 10, 2012 at 2:06 PM, Matthieu Labour <[EMAIL PROTECTED] > >wrote: > > > > > > > > Jarcek > > > > > > I am quite new to hadoop and amazon EMR. Where are those files located? > > > > > > Here is what I am doing: > > > > > > 1) I am using amazon elastic map reduce and I have created a New Job > that > > > does not terminate and whose type is HBase > > > > > > 2) I get the job id > > > myaccount@ubuntu:~/elastic-mapreduce-cli$ ./elastic-mapreduce --list > > > --active > > > j-3EFP15LBJC8R4 RUNNING > > > ec2-XXX-XX-XXX-XX.compute-1.amazonaws.com sqooping > > > COMPLETED Setup Hadoop Debugging > > > COMPLETED Start HBase > > > COMPLETED Setup Hive > > > RUNNING Setup Pig > > > > > > 3) I attach and run a step: > > > ./elastic-mapreduce -j j-3EFP15LBJC8R4 --jar > > > s3://elasticmapreduce/libs/script-runner/script-runner.jar --arg > > > s3://mybucket/sqoop/sqoop.sh > > > > > > 4) I ssh the machine. ssh -i ~/.ec2/MYKEY.pem > > > [EMAIL PROTECTED] > > > > > > 5) tail -f /mnt/var/lib/hadoop/steps/6/stderr shows the mapreduce job > > > hanging > > > 12/10/10 17:46:58 DEBUG mapreduce.ExportInputFormat: Generated splits: > > > 12/10/10 17:46:58 DEBUG mapreduce.ExportInputFormat: > > > > Paths:/mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01/part-m-00000:0+52 > > > Locations:ip-10-77-70-192.ec2.internal:; > > > 12/10/10 17:46:58 INFO mapred.JobClient: Running job: > job_201210101503_0024 > > > 12/10/10 17:46:59 INFO mapred.JobClient: map 0% reduce 0% > > > > > > 6) In /mnt/var/lib/hadoop/steps/6 there is the scoop.sh script file > with > > > ~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect > > > jdbc:mysql://hostname:3306/analyticsdb --username username --password > > > password --table ml_ys_log_gmt_test --export-dir > > > =hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01 Matthieu Labour, Engineering | *Action**X* | 584 Broadway, Suite 1002 – NY, NY 10012 415-994-3480 (m) +
Matthieu Labour 2012-10-11, 14:39
-
Re: Need help and tips for tthe following issue: No data get exported from hadoop to mysql using sqoop.Jarek Jarcec Cecho 2012-10-11, 15:38
Hi sir,
I'm sorry but it's hard to help without the actual task log that should contain more details about the exception. I was able to dig following Amazon documentation that deals with getting Hadoop Web UI. Would you mind trying it and see if you can reach map task log? http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingtheHadoopUserInterface.html Jarcec On Thu, Oct 11, 2012 at 10:39:38AM -0400, Matthieu Labour wrote: > Jarceq > Thank you for your reply > I have a hard time to believe that this is a jdbc connection issue because > when i execute the sqoop export command, it succesfully executes Executing > SQL statement: SELECT t.* FROM `ml_ys_log_gmt_test` AS t LIMIT 1 and if i > cange the password in the sqoop export command then I > get java.sql.SQLException: Access denied for user > So sqoop export seems to be able to reach the Sql machine with that > username and password > I will use the postgresql for now as it works for me! > Thank you for your help > > > On Wed, Oct 10, 2012 at 7:58 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > > > Hi sir, > > I have actually zero experience with amazon services, so I'm afraid that I > > can't much help you navigate to the map tasks logs. Usually on normal > > hadoop cluster, there is service call "Job Tracker" that is serving as > > central place for mapreduce jobs. I'm expecting that you should be able to > > find this webservice or something similar somehow somewhere. You should see > > job executed by hadoop there and you also should be able to get to > > individual task logs. > > > > Following my previous blind shoot - How is defined MySQL user that you're > > using for Sqoop? I'm very interested to know the host part of the user. For > > example usually there are users like root@localhost or jarcec@'%'. If > > your host part (in my examples it's localhost or '%') is restrictive enough > > your hadoop nodes might not be capable of connecting to that MySQL box and > > thus resulting in connection failures. > > > > Jarcec > > > > On Wed, Oct 10, 2012 at 05:22:14PM -0400, Matthieu Labour wrote: > > > Hi Jarcek > > > If i use the postgresql jdbc connector and connect to one of our heroku > > > machine then scoop works > > > ~/$SQOOP_ROOT/bin/sqoop export --connect > > > jdbc:postgresql://ec2-XX-XX-XXX-XX.compute-1.amazonaws.com:database > > > --username username --password password --table ml_ys_log_gmt_test > > > --export-dir -export-dir > > > =hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01 > > > --input-fields-terminated-by='\t' > > > --lines-terminated-by='\n' --verbose --batch > > > > > > On Wed, Oct 10, 2012 at 2:06 PM, Matthieu Labour <[EMAIL PROTECTED] > > >wrote: > > > > > > > > > > > Jarcek > > > > > > > > I am quite new to hadoop and amazon EMR. Where are those files located? > > > > > > > > Here is what I am doing: > > > > > > > > 1) I am using amazon elastic map reduce and I have created a New Job > > that > > > > does not terminate and whose type is HBase > > > > > > > > 2) I get the job id > > > > myaccount@ubuntu:~/elastic-mapreduce-cli$ ./elastic-mapreduce --list > > > > --active > > > > j-3EFP15LBJC8R4 RUNNING > > > > ec2-XXX-XX-XXX-XX.compute-1.amazonaws.com sqooping > > > > COMPLETED Setup Hadoop Debugging > > > > COMPLETED Start HBase > > > > COMPLETED Setup Hive > > > > RUNNING Setup Pig > > > > > > > > 3) I attach and run a step: > > > > ./elastic-mapreduce -j j-3EFP15LBJC8R4 --jar > > > > s3://elasticmapreduce/libs/script-runner/script-runner.jar --arg > > > > s3://mybucket/sqoop/sqoop.sh > > > > > > > > 4) I ssh the machine. ssh -i ~/.ec2/MYKEY.pem > > > > [EMAIL PROTECTED] > > > > > > > > 5) tail -f /mnt/var/lib/hadoop/steps/6/stderr shows the mapreduce job > > > > hanging > > > > 12/10/10 17:46:58 DEBUG mapreduce.ExportInputFormat: Generated splits: > > > > 12/10/10 17:46:58 DEBUG mapreduce.ExportInputFormat: > > > > > > Paths:/mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01/part-m-00000:0+52 +
Jarek Jarcec Cecho 2012-10-11, 15:38
-
Re: Need help and tips for tthe following issue: No data get exported from hadoop to mysql using sqoop.Jarek Jarcec Cecho 2012-10-15, 21:40
Hi Matt,
thanks for getting back to me with actual task log. I'm adding Sqoop user mailing list back in loop in so that others might jump in. I've however removed entire log to prevent disclosure of any sensitive data. The log contained only original Connection exception with no further wrapped exceptions that would help finding out the cause. I would recommend to do following: Connect to each of your slave nodes (ssh) and try to connect to your MySQL box, e.g. something like mysql -h mysql.server -u myuser -pmypassword database It should be working from each node. If this won't work (and I'm expecting that it won't), then there might be some firewall issues or other networking problems that you will have to solve. Jarcec On Mon, Oct 15, 2012 at 12:34:08PM -0400, Matthieu Labour wrote: > Hi Jarcec > Please find emclosed the screenshot using the hadoop web interfaces: > http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingtheHadoopUserInterface.html > I am sending an email directly to you as the log might contain some info > that I would rather not have on the web/ mailing list > The trace is the same as the one I can see when I ssh the master node and > explore the logs under /mnt/var/log/hadoop/steps/ ... > If you tell me the best place to add some logs in sqoop, then i can > recompile and rerun > The bizarre thing is that the select seems to work. > Cheers > Matthieu > === SENSITIVE CONTENT REMOVED ==> > > On Thu, Oct 11, 2012 at 11:38 AM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > > > Hi sir, > > I'm sorry but it's hard to help without the actual task log that should > > contain more details about the exception. I was able to dig following > > Amazon documentation that deals with getting Hadoop Web UI. Would you mind > > trying it and see if you can reach map task log? > > > > > > http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingtheHadoopUserInterface.html > > > > Jarcec > > > > On Thu, Oct 11, 2012 at 10:39:38AM -0400, Matthieu Labour wrote: > > > Jarceq > > > Thank you for your reply > > > I have a hard time to believe that this is a jdbc connection issue > > because > > > when i execute the sqoop export command, it succesfully executes > > Executing > > > SQL statement: SELECT t.* FROM `ml_ys_log_gmt_test` AS t LIMIT 1 and if i > > > cange the password in the sqoop export command then I > > > get java.sql.SQLException: Access denied for user > > > So sqoop export seems to be able to reach the Sql machine with that > > > username and password > > > I will use the postgresql for now as it works for me! > > > Thank you for your help > > > > > > > > > On Wed, Oct 10, 2012 at 7:58 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED] > > >wrote: > > > > > > > Hi sir, > > > > I have actually zero experience with amazon services, so I'm afraid > > that I > > > > can't much help you navigate to the map tasks logs. Usually on normal > > > > hadoop cluster, there is service call "Job Tracker" that is serving as > > > > central place for mapreduce jobs. I'm expecting that you should be > > able to > > > > find this webservice or something similar somehow somewhere. You > > should see > > > > job executed by hadoop there and you also should be able to get to > > > > individual task logs. > > > > > > > > Following my previous blind shoot - How is defined MySQL user that > > you're > > > > using for Sqoop? I'm very interested to know the host part of the > > user. For > > > > example usually there are users like root@localhost or jarcec@'%'. If > > > > your host part (in my examples it's localhost or '%') is restrictive > > enough > > > > your hadoop nodes might not be capable of connecting to that MySQL box > > and > > > > thus resulting in connection failures. > > > > > > > > Jarcec > > > > > > > > On Wed, Oct 10, 2012 at 05:22:14PM -0400, Matthieu Labour wrote: > > > > > Hi Jarcek > > > > > If i use the postgresql jdbc connector and connect to one of our > > heroku > > > > > machine then scoop works +
Jarek Jarcec Cecho 2012-10-15, 21:40
-
Re: Need help and tips for tthe following issue: No data get exported from hadoop to mysql using sqoop.Matthieu Labour 2012-10-17, 21:53
Jarceq
My bad. My mistake. Amazon attaches the slave and the master to 2 different security groups and both had to be added to the Amazon RDS It is working now Thank you for your help On Mon, Oct 15, 2012 at 5:40 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > Hi Matt, > thanks for getting back to me with actual task log. I'm adding Sqoop user > mailing list back in loop in so that others might jump in. I've however > removed entire log to prevent disclosure of any sensitive data. > > The log contained only original Connection exception with no further > wrapped exceptions that would help finding out the cause. I would recommend > to do following: > > Connect to each of your slave nodes (ssh) and try to connect to your MySQL > box, e.g. something like > > mysql -h mysql.server -u myuser -pmypassword database > > It should be working from each node. If this won't work (and I'm expecting > that it won't), then there might be some firewall issues or other > networking problems that you will have to solve. > > Jarcec > > On Mon, Oct 15, 2012 at 12:34:08PM -0400, Matthieu Labour wrote: > > Hi Jarcec > > Please find emclosed the screenshot using the hadoop web interfaces: > > > http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingtheHadoopUserInterface.html > > I am sending an email directly to you as the log might contain some info > > that I would rather not have on the web/ mailing list > > The trace is the same as the one I can see when I ssh the master node and > > explore the logs under /mnt/var/log/hadoop/steps/ ... > > If you tell me the best place to add some logs in sqoop, then i can > > recompile and rerun > > The bizarre thing is that the select seems to work. > > Cheers > > Matthieu > > > > === SENSITIVE CONTENT REMOVED ==> > > > > > On Thu, Oct 11, 2012 at 11:38 AM, Jarek Jarcec Cecho <[EMAIL PROTECTED] > >wrote: > > > > > Hi sir, > > > I'm sorry but it's hard to help without the actual task log that should > > > contain more details about the exception. I was able to dig following > > > Amazon documentation that deals with getting Hadoop Web UI. Would you > mind > > > trying it and see if you can reach map task log? > > > > > > > > > > http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingtheHadoopUserInterface.html > > > > > > Jarcec > > > > > > On Thu, Oct 11, 2012 at 10:39:38AM -0400, Matthieu Labour wrote: > > > > Jarceq > > > > Thank you for your reply > > > > I have a hard time to believe that this is a jdbc connection issue > > > because > > > > when i execute the sqoop export command, it succesfully executes > > > Executing > > > > SQL statement: SELECT t.* FROM `ml_ys_log_gmt_test` AS t LIMIT 1 and > if i > > > > cange the password in the sqoop export command then I > > > > get java.sql.SQLException: Access denied for user > > > > So sqoop export seems to be able to reach the Sql machine with that > > > > username and password > > > > I will use the postgresql for now as it works for me! > > > > Thank you for your help > > > > > > > > > > > > On Wed, Oct 10, 2012 at 7:58 PM, Jarek Jarcec Cecho < > [EMAIL PROTECTED] > > > >wrote: > > > > > > > > > Hi sir, > > > > > I have actually zero experience with amazon services, so I'm afraid > > > that I > > > > > can't much help you navigate to the map tasks logs. Usually on > normal > > > > > hadoop cluster, there is service call "Job Tracker" that is > serving as > > > > > central place for mapreduce jobs. I'm expecting that you should be > > > able to > > > > > find this webservice or something similar somehow somewhere. You > > > should see > > > > > job executed by hadoop there and you also should be able to get to > > > > > individual task logs. > > > > > > > > > > Following my previous blind shoot - How is defined MySQL user that > > > you're > > > > > using for Sqoop? I'm very interested to know the host part of the > > > user. For > > > > > example usually there are users like root@localhost or jarcec@'%'. Matthieu Labour, Engineering | *Action**X* | 584 Broadway, Suite 1002 – NY, NY 10012 415-994-3480 (m) +
Matthieu Labour 2012-10-17, 21:53
-
Re: Need help and tips for tthe following issue: No data get exported from hadoop to mysql using sqoop.Jarek Jarcec Cecho 2012-10-17, 21:58
I'm glad that you've managed to resolve the issue. Happy sqooping!
Jarcec On Wed, Oct 17, 2012 at 05:53:24PM -0400, Matthieu Labour wrote: > Jarceq > My bad. My mistake. Amazon attaches the slave and the master to 2 different > security groups and both had to be added to the Amazon RDS > It is working now > Thank you for your help > > On Mon, Oct 15, 2012 at 5:40 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > > > Hi Matt, > > thanks for getting back to me with actual task log. I'm adding Sqoop user > > mailing list back in loop in so that others might jump in. I've however > > removed entire log to prevent disclosure of any sensitive data. > > > > The log contained only original Connection exception with no further > > wrapped exceptions that would help finding out the cause. I would recommend > > to do following: > > > > Connect to each of your slave nodes (ssh) and try to connect to your MySQL > > box, e.g. something like > > > > mysql -h mysql.server -u myuser -pmypassword database > > > > It should be working from each node. If this won't work (and I'm expecting > > that it won't), then there might be some firewall issues or other > > networking problems that you will have to solve. > > > > Jarcec > > > > On Mon, Oct 15, 2012 at 12:34:08PM -0400, Matthieu Labour wrote: > > > Hi Jarcec > > > Please find emclosed the screenshot using the hadoop web interfaces: > > > > > http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingtheHadoopUserInterface.html > > > I am sending an email directly to you as the log might contain some info > > > that I would rather not have on the web/ mailing list > > > The trace is the same as the one I can see when I ssh the master node and > > > explore the logs under /mnt/var/log/hadoop/steps/ ... > > > If you tell me the best place to add some logs in sqoop, then i can > > > recompile and rerun > > > The bizarre thing is that the select seems to work. > > > Cheers > > > Matthieu > > > > > > > === SENSITIVE CONTENT REMOVED ==> > > > > > > > > On Thu, Oct 11, 2012 at 11:38 AM, Jarek Jarcec Cecho <[EMAIL PROTECTED] > > >wrote: > > > > > > > Hi sir, > > > > I'm sorry but it's hard to help without the actual task log that should > > > > contain more details about the exception. I was able to dig following > > > > Amazon documentation that deals with getting Hadoop Web UI. Would you > > mind > > > > trying it and see if you can reach map task log? > > > > > > > > > > > > > > http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingtheHadoopUserInterface.html > > > > > > > > Jarcec > > > > > > > > On Thu, Oct 11, 2012 at 10:39:38AM -0400, Matthieu Labour wrote: > > > > > Jarceq > > > > > Thank you for your reply > > > > > I have a hard time to believe that this is a jdbc connection issue > > > > because > > > > > when i execute the sqoop export command, it succesfully executes > > > > Executing > > > > > SQL statement: SELECT t.* FROM `ml_ys_log_gmt_test` AS t LIMIT 1 and > > if i > > > > > cange the password in the sqoop export command then I > > > > > get java.sql.SQLException: Access denied for user > > > > > So sqoop export seems to be able to reach the Sql machine with that > > > > > username and password > > > > > I will use the postgresql for now as it works for me! > > > > > Thank you for your help > > > > > > > > > > > > > > > On Wed, Oct 10, 2012 at 7:58 PM, Jarek Jarcec Cecho < > > [EMAIL PROTECTED] > > > > >wrote: > > > > > > > > > > > Hi sir, > > > > > > I have actually zero experience with amazon services, so I'm afraid > > > > that I > > > > > > can't much help you navigate to the map tasks logs. Usually on > > normal > > > > > > hadoop cluster, there is service call "Job Tracker" that is > > serving as > > > > > > central place for mapreduce jobs. I'm expecting that you should be > > > > able to > > > > > > find this webservice or something similar somehow somewhere. You > > > > should see > > > > > > job executed by hadoop there and you also should be able to get to +
Jarek Jarcec Cecho 2012-10-17, 21:58
-
Re: Need help and tips for tthe following issue: No data get exported from hadoop to mysql using sqoop.Matthieu Labour 2012-10-17, 22:01
Thank you for the great library !
On Wed, Oct 17, 2012 at 5:58 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > I'm glad that you've managed to resolve the issue. Happy sqooping! > > Jarcec > > On Wed, Oct 17, 2012 at 05:53:24PM -0400, Matthieu Labour wrote: > > Jarceq > > My bad. My mistake. Amazon attaches the slave and the master to 2 > different > > security groups and both had to be added to the Amazon RDS > > It is working now > > Thank you for your help > > > > On Mon, Oct 15, 2012 at 5:40 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED] > >wrote: > > > > > Hi Matt, > > > thanks for getting back to me with actual task log. I'm adding Sqoop > user > > > mailing list back in loop in so that others might jump in. I've however > > > removed entire log to prevent disclosure of any sensitive data. > > > > > > The log contained only original Connection exception with no further > > > wrapped exceptions that would help finding out the cause. I would > recommend > > > to do following: > > > > > > Connect to each of your slave nodes (ssh) and try to connect to your > MySQL > > > box, e.g. something like > > > > > > mysql -h mysql.server -u myuser -pmypassword database > > > > > > It should be working from each node. If this won't work (and I'm > expecting > > > that it won't), then there might be some firewall issues or other > > > networking problems that you will have to solve. > > > > > > Jarcec > > > > > > On Mon, Oct 15, 2012 at 12:34:08PM -0400, Matthieu Labour wrote: > > > > Hi Jarcec > > > > Please find emclosed the screenshot using the hadoop web interfaces: > > > > > > > > http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingtheHadoopUserInterface.html > > > > I am sending an email directly to you as the log might contain some > info > > > > that I would rather not have on the web/ mailing list > > > > The trace is the same as the one I can see when I ssh the master > node and > > > > explore the logs under /mnt/var/log/hadoop/steps/ ... > > > > If you tell me the best place to add some logs in sqoop, then i can > > > > recompile and rerun > > > > The bizarre thing is that the select seems to work. > > > > Cheers > > > > Matthieu > > > > > > > > > > === SENSITIVE CONTENT REMOVED ==> > > > > > > > > > > > On Thu, Oct 11, 2012 at 11:38 AM, Jarek Jarcec Cecho < > [EMAIL PROTECTED] > > > >wrote: > > > > > > > > > Hi sir, > > > > > I'm sorry but it's hard to help without the actual task log that > should > > > > > contain more details about the exception. I was able to dig > following > > > > > Amazon documentation that deals with getting Hadoop Web UI. Would > you > > > mind > > > > > trying it and see if you can reach map task log? > > > > > > > > > > > > > > > > > > > http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingtheHadoopUserInterface.html > > > > > > > > > > Jarcec > > > > > > > > > > On Thu, Oct 11, 2012 at 10:39:38AM -0400, Matthieu Labour wrote: > > > > > > Jarceq > > > > > > Thank you for your reply > > > > > > I have a hard time to believe that this is a jdbc connection > issue > > > > > because > > > > > > when i execute the sqoop export command, it succesfully executes > > > > > Executing > > > > > > SQL statement: SELECT t.* FROM `ml_ys_log_gmt_test` AS t LIMIT 1 > and > > > if i > > > > > > cange the password in the sqoop export command then I > > > > > > get java.sql.SQLException: Access denied for user > > > > > > So sqoop export seems to be able to reach the Sql machine with > that > > > > > > username and password > > > > > > I will use the postgresql for now as it works for me! > > > > > > Thank you for your help > > > > > > > > > > > > > > > > > > On Wed, Oct 10, 2012 at 7:58 PM, Jarek Jarcec Cecho < > > > [EMAIL PROTECTED] > > > > > >wrote: > > > > > > > > > > > > > Hi sir, > > > > > > > I have actually zero experience with amazon services, so I'm > afraid > > > > > that I > > > > > > > can't much help you navigate to the map tasks logs. Usually on > > > normal Matthieu Labour, Engineering | *Action**X* | 584 Broadway, Suite 1002 – NY, NY 10012 415-994-3480 (m) +
Matthieu Labour 2012-10-17, 22:01
|