Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Need help and tips for tthe following issue: No data get exported from hadoop to mysql using sqoop.


Copy link to this message
-
Re: Need help and tips for tthe following issue: No data get exported from hadoop to mysql using sqoop.
Hi Jarcek
If i use the postgresql jdbc connector and connect to one of our heroku
machine then scoop works
~/$SQOOP_ROOT/bin/sqoop export --connect
jdbc:postgresql://ec2-XX-XX-XXX-XX.compute-1.amazonaws.com:database
--username username --password password --table ml_ys_log_gmt_test
--export-dir -export-dir
=hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01
--input-fields-terminated-by='\t'
--lines-terminated-by='\n' --verbose --batch

On Wed, Oct 10, 2012 at 2:06 PM, Matthieu Labour <[EMAIL PROTECTED]>wrote:

>
> Jarcek
>
> I am quite new to hadoop and amazon EMR. Where are those files located?
>
> Here is what I am doing:
>
> 1) I am using amazon elastic map reduce and I have created a New Job that
> does not terminate and whose type is HBase
>
> 2) I get the job id
> myaccount@ubuntu:~/elastic-mapreduce-cli$ ./elastic-mapreduce --list
> --active
> j-3EFP15LBJC8R4     RUNNING
> ec2-XXX-XX-XXX-XX.compute-1.amazonaws.com         sqooping
>    COMPLETED      Setup Hadoop Debugging
>    COMPLETED      Start HBase
>    COMPLETED      Setup Hive
>    RUNNING        Setup Pig
>
> 3) I attach and run a step:
> ./elastic-mapreduce -j j-3EFP15LBJC8R4 --jar
> s3://elasticmapreduce/libs/script-runner/script-runner.jar --arg
> s3://mybucket/sqoop/sqoop.sh
>
> 4) I ssh the machine. ssh -i ~/.ec2/MYKEY.pem
> [EMAIL PROTECTED]
>
> 5) tail -f /mnt/var/lib/hadoop/steps/6/stderr shows the mapreduce job
> hanging
> 12/10/10 17:46:58 DEBUG mapreduce.ExportInputFormat: Generated splits:
> 12/10/10 17:46:58 DEBUG mapreduce.ExportInputFormat:
> Paths:/mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01/part-m-00000:0+52
> Locations:ip-10-77-70-192.ec2.internal:;
> 12/10/10 17:46:58 INFO mapred.JobClient: Running job: job_201210101503_0024
> 12/10/10 17:46:59 INFO mapred.JobClient:  map 0% reduce 0%
>
> 6) In /mnt/var/lib/hadoop/steps/6 there is the scoop.sh script file with
> ~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect
> jdbc:mysql://hostname:3306/analyticsdb --username username --password
> password --table ml_ys_log_gmt_test --export-dir
> =hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01
> --input-fields-termi
> nated-by='\t' --lines-terminated-by='\n' --verbose --batch
>
> On that same machine, same location ( /mnt/var/lib/hadoop/steps/6), the
> following command works
> mysql -h hostname -P 3306 -u username -p
> password: password
> Afterwards I can use the database, describe the table etc ....
> Please note the mysql machine is running on Amazon RDS and I have
> added ElasticMapReduce-master security group to RDS
>
> Thank you for your help
>
>
> On Wed, Oct 10, 2012 at 1:27 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote:
>
>> It would be very helpful if you could send us task log from one map job
>> that Sqoop executes.
>>
>> Blindly shooting - Sqoop is connecting to your database from map tasks.
>> Based on the connection issues - are you sure that you can connect to your
>> database from all nodes in your cluster?
>>
>> Jarcec
>>
>> On Wed, Oct 10, 2012 at 01:16:03PM -0400, Matthieu Labour wrote:
>> > Hi Jerek
>> >
>> > Thank you so much for your help.
>> >
>> > Following your advice, I run the following command:
>> > ~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect
>> > jdbc:mysql://hostname:3306/analyticsdb --username username --password
>> > password --table ml_ys_log_gmt_test --export-dir
>> > hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01
>> > --input-fields-terminated-by='\t' --lines-terminated-by='\n' --verbose
>> >
>> > It seems to find the file to export. So that is good. In the log I see
>> the
>> > following: (I am not sure why :0+52 gets appended)
>> > 2/10/10 16:43:41 DEBUG mapreduce.ExportInputFormat:
>> >
>> Paths:/mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01/part-m-00000:0+52
>> > Locations:ip-XX-XX-XX-XXX.ec2.internal:;
>> >
>> > However it hangs forever after it printed the following:
>> > 12/10/10 16:43:42 INFO mapred.JobClient:  map 0% reduce 0%
Matthieu Labour, Engineering | *Action**X* |
584 Broadway, Suite 1002 – NY, NY 10012
415-994-3480 (m)