Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Need help and tips for tthe following issue: No data get exported from hadoop to mysql using sqoop.


Copy link to this message
-
Re: Need help and tips for tthe following issue: No data get exported from hadoop to mysql using sqoop.
Hi Jarcek
If i use the postgresql jdbc connector and connect to one of our heroku
machine then scoop works
~/$SQOOP_ROOT/bin/sqoop export --connect
jdbc:postgresql://ec2-XX-XX-XXX-XX.compute-1.amazonaws.com:database
--username username --password password --table ml_ys_log_gmt_test
--export-dir -export-dir
=hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01
--input-fields-terminated-by='\t'
--lines-terminated-by='\n' --verbose --batch

On Wed, Oct 10, 2012 at 2:06 PM, Matthieu Labour <[EMAIL PROTECTED]>wrote:

>
> Jarcek
>
> I am quite new to hadoop and amazon EMR. Where are those files located?
>
> Here is what I am doing:
>
> 1) I am using amazon elastic map reduce and I have created a New Job that
> does not terminate and whose type is HBase
>
> 2) I get the job id
> myaccount@ubuntu:~/elastic-mapreduce-cli$ ./elastic-mapreduce --list
> --active
> j-3EFP15LBJC8R4     RUNNING
> ec2-XXX-XX-XXX-XX.compute-1.amazonaws.com         sqooping
>    COMPLETED      Setup Hadoop Debugging
>    COMPLETED      Start HBase
>    COMPLETED      Setup Hive
>    RUNNING        Setup Pig
>
> 3) I attach and run a step:
> ./elastic-mapreduce -j j-3EFP15LBJC8R4 --jar
> s3://elasticmapreduce/libs/script-runner/script-runner.jar --arg
> s3://mybucket/sqoop/sqoop.sh
>
> 4) I ssh the machine. ssh -i ~/.ec2/MYKEY.pem
> [EMAIL PROTECTED]
>
> 5) tail -f /mnt/var/lib/hadoop/steps/6/stderr shows the mapreduce job
> hanging
> 12/10/10 17:46:58 DEBUG mapreduce.ExportInputFormat: Generated splits:
> 12/10/10 17:46:58 DEBUG mapreduce.ExportInputFormat:
> Paths:/mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01/part-m-00000:0+52
> Locations:ip-10-77-70-192.ec2.internal:;
> 12/10/10 17:46:58 INFO mapred.JobClient: Running job: job_201210101503_0024
> 12/10/10 17:46:59 INFO mapred.JobClient:  map 0% reduce 0%
>
> 6) In /mnt/var/lib/hadoop/steps/6 there is the scoop.sh script file with
> ~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect
> jdbc:mysql://hostname:3306/analyticsdb --username username --password
> password --table ml_ys_log_gmt_test --export-dir
> =hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01
> --input-fields-termi
> nated-by='\t' --lines-terminated-by='\n' --verbose --batch
>
> On that same machine, same location ( /mnt/var/lib/hadoop/steps/6), the
> following command works
> mysql -h hostname -P 3306 -u username -p
> password: password
> Afterwards I can use the database, describe the table etc ....
> Please note the mysql machine is running on Amazon RDS and I have
> added ElasticMapReduce-master security group to RDS
>
> Thank you for your help
>
>
> On Wed, Oct 10, 2012 at 1:27 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote:
>
>> It would be very helpful if you could send us task log from one map job
>> that Sqoop executes.
>>
>> Blindly shooting - Sqoop is connecting to your database from map tasks.
>> Based on the connection issues - are you sure that you can connect to your
>> database from all nodes in your cluster?
>>
>> Jarcec
>>
>> On Wed, Oct 10, 2012 at 01:16:03PM -0400, Matthieu Labour wrote:
>> > Hi Jerek
>> >
>> > Thank you so much for your help.
>> >
>> > Following your advice, I run the following command:
>> > ~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect
>> > jdbc:mysql://hostname:3306/analyticsdb --username username --password
>> > password --table ml_ys_log_gmt_test --export-dir
>> > hdfs:///mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01
>> > --input-fields-terminated-by='\t' --lines-terminated-by='\n' --verbose
>> >
>> > It seems to find the file to export. So that is good. In the log I see
>> the
>> > following: (I am not sure why :0+52 gets appended)
>> > 2/10/10 16:43:41 DEBUG mapreduce.ExportInputFormat:
>> >
>> Paths:/mnt/var/lib/hadoop/dfs/logs_sanitized_test/dt=2012-10-01/part-m-00000:0+52
>> > Locations:ip-XX-XX-XX-XXX.ec2.internal:;
>> >
>> > However it hangs forever after it printed the following:
>> > 12/10/10 16:43:42 INFO mapred.JobClient:  map 0% reduce 0%
Matthieu Labour, Engineering | *Action**X* |
584 Broadway, Suite 1002 – NY, NY 10012
415-994-3480 (m)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB