Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop, mail # user - /tmp dir for import configurable?


+
Christian Prokopp 2013-03-28, 15:35
+
Alexander Alten-Lorenz 2013-03-28, 15:50
+
Christian Prokopp 2013-03-28, 15:54
+
Jarek Jarcec Cecho 2013-03-28, 21:49
Copy link to this message
-
Re: /tmp dir for import configurable?
Christian Prokopp 2013-04-02, 10:38
Hi Jarcec,

I am running the command on the CLI of a cluster node. It appears to run a
local MR job writing the results to /tmp before sending it to S3:

[..]
[hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper:
Beginning mysqldump fast path import
[hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper:
Performing import of table image from database some_db
[hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper:
Converting data to use specified delimiters.
[hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper: (For
the fastest possible import, use
[hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper:
--mysql-delimiters to specify the same field
[hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper:
delimiters as are used by mysqldump.)
[hostaddress] out: 13/04/02 01:52:54 INFO mapred.LocalJobRunner:
[hostaddress] out: 13/04/02 01:52:55 INFO mapred.JobClient:  map 100%
reduce 0%
[hostaddress] out: 13/04/02 01:52:57 INFO mapred.LocalJobRunner:
[..]
[hostaddress] out: 13/04/02 01:53:03 INFO mapred.LocalJobRunner:
[hostaddress] out: 13/04/02 01:54:42 INFO mapreduce.MySQLDumpMapper:
Transfer loop complete.
[hostaddress] out: 13/04/02 01:54:42 INFO mapreduce.MySQLDumpMapper:
Transferred 668.9657 MB in 113.0105 seconds (5.9195 MB/sec)
[hostaddress] out: 13/04/02 01:54:42 INFO mapred.LocalJobRunner:
[hostaddress] out: 13/04/02 01:54:42 INFO s3native.NativeS3FileSystem:
OutputStream for key
'some_table/_temporary/_attempt_local555455791_0001_m_000000_0/part-m-00000'
closed. Now beginning upload
[hostaddress] out: 13/04/02 01:54:42 INFO mapred.LocalJobRunner:
[hostaddress] out: 13/04/02 01:54:45 INFO mapred.LocalJobRunner:
[hostaddress] out: 13/04/02 01:55:31 INFO s3native.NativeS3FileSystem:
OutputStream for key
'some_table/_temporary/_attempt_local555455791_0001_m_000000_0/part-m-00000'
upload complete
[hostaddress] out: 13/04/02 01:55:31 INFO mapred.Task:
Task:attempt_local555455791_0001_m_000000_0 is done. And is in the process
of commiting
[hostaddress] out: 13/04/02 01:55:31 INFO mapred.LocalJobRunner:
[hostaddress] out: 13/04/02 01:55:31 INFO mapred.Task: Task
attempt_local555455791_0001_m_000000_0 is allowed to commit now
[hostaddress] out: 13/04/02 01:55:36 INFO mapred.LocalJobRunner:
[hostaddress] out: 13/04/02 01:56:03 WARN output.FileOutputCommitter:
Failed to delete the temporary output directory of task:
attempt_local555455791_0001_m_000000_0 - s3n://secret@bucketsomewhere
/some_table/_temporary/_attempt_local555455791_0001_m_000000_0
[hostaddress] out: 13/04/02 01:56:03 INFO output.FileOutputCommitter: Saved
output of task 'attempt_local555455791_0001_m_000000_0' to
s3n://secret@bucketsomewhere/some_table
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.LocalJobRunner:
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.Task: Task
'attempt_local555455791_0001_m_000000_0' done.
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.LocalJobRunner: Finishing
task: attempt_local555455791_0001_m_000000_0
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.LocalJobRunner: Map task
executor complete.
[hostaddress] out: 13/04/02 01:56:03 INFO s3native.NativeS3FileSystem:
OutputStream for key 'some_table/_SUCCESS' writing to tempfile '*
/tmp/hadoop-jenkins/s3/output-1400873345908825433.tmp*'
[hostaddress] out: 13/04/02 01:56:03 INFO s3native.NativeS3FileSystem:
OutputStream for key 'some_table/_SUCCESS' closed. Now beginning upload
[hostaddress] out: 13/04/02 01:56:03 INFO s3native.NativeS3FileSystem:
OutputStream for key 'some_table/_SUCCESS' upload complete
[...deleting cached jars...]
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: Job complete:
job_local555455791_0001
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: Counters: 23
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:   File System
Counters
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:     FILE:
Number of bytes read=6471451
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:     FILE:
Number of bytes written=6623109
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:     FILE:
Number of read operations=0
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:     FILE:
Number of large read operations=0
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:     FILE:
Number of write operations=0
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:     HDFS:
Number of bytes read=0
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:     HDFS:
Number of bytes written=0
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:     HDFS:
Number of read operations=0
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:     HDFS:
Number of large read operations=0
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:     HDFS:
Number of write operations=0
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:     S3N: Number
of bytes read=0
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:     S3N: Number
of bytes written=773081963
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:     S3N: Number
of read operations=0
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:     S3N: Number
of large read operations=0
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:     S3N: Number
of write operations=0
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:   Map-Reduce
Framework
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:     Map input
records=1
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:     Map output
records=14324124
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:     Input split
bytes=87
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:     Spilled
Records=0
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:     CPU time
spent (ms)=0
[hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient:     Physical
memory (bytes) snapshot=0
[hostaddress] out: 13/04/02
+
Jarek Jarcec Cecho 2013-04-06, 05:05
+
Christian Prokopp 2013-04-10, 11:20