Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop, mail # user - /tmp dir for import configurable?


+
Christian Prokopp 2013-03-28, 15:35
+
Alexander Alten-Lorenz 2013-03-28, 15:50
+
Christian Prokopp 2013-03-28, 15:54
+
Jarek Jarcec Cecho 2013-03-28, 21:49
+
Christian Prokopp 2013-04-02, 10:38
Copy link to this message
-
Re: /tmp dir for import configurable?
Jarek Jarcec Cecho 2013-04-06, 05:05
Hi Christian,
thank you very much for sharing the log and please accept my apologies for late response.

Closely looking into your exception, I can confirm that it's the S3 file system that is creating the files in /tmp and not Sqoop itself.

> [hostaddress] out: 13/04/02 01:56:03 INFO s3native.NativeS3FileSystem:
> OutputStream for key 'some_table/_SUCCESS' writing to tempfile '*
> /tmp/hadoop-jenkins/s3/output-1400873345908825433.tmp*'

Taking a brief look into the source code [1], it seems that it's the method newBackupFile() defined on line 195 that is responsible for creating the temporary file. And also it seems that it's behaviour can be altered using fs.s3.buffer.dir property. Would you mind to try use it in your Sqoop execution?

  sqoop import -Dfs.s3.buffer.dir=/custom/path ...

I've also noticed that you're using the LocalJobRunner which is suggesting Sqoop is executing all jobs locally on your machine and not on your Hadoop cluster. I would recommend checking Hadoop configuration in case that your intention is to run your data transfer in parallel.

Jarcec

Links:
1: http://hadoop.apache.org/docs/r2.0.3-alpha/api/src-html/org/apache/hadoop/fs/s3native/NativeS3FileSystem.html

On Tue, Apr 02, 2013 at 11:38:35AM +0100, Christian Prokopp wrote:
> Hi Jarcec,
>
> I am running the command on the CLI of a cluster node. It appears to run a
> local MR job writing the results to /tmp before sending it to S3:
>
> [..]
> [hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper:
> Beginning mysqldump fast path import
> [hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper:
> Performing import of table image from database some_db
> [hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper:
> Converting data to use specified delimiters.
> [hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper: (For
> the fastest possible import, use
> [hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper:
> --mysql-delimiters to specify the same field
> [hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper:
> delimiters as are used by mysqldump.)
> [hostaddress] out: 13/04/02 01:52:54 INFO mapred.LocalJobRunner:
> [hostaddress] out: 13/04/02 01:52:55 INFO mapred.JobClient:  map 100%
> reduce 0%
> [hostaddress] out: 13/04/02 01:52:57 INFO mapred.LocalJobRunner:
> [..]
> [hostaddress] out: 13/04/02 01:53:03 INFO mapred.LocalJobRunner:
> [hostaddress] out: 13/04/02 01:54:42 INFO mapreduce.MySQLDumpMapper:
> Transfer loop complete.
> [hostaddress] out: 13/04/02 01:54:42 INFO mapreduce.MySQLDumpMapper:
> Transferred 668.9657 MB in 113.0105 seconds (5.9195 MB/sec)
> [hostaddress] out: 13/04/02 01:54:42 INFO mapred.LocalJobRunner:
> [hostaddress] out: 13/04/02 01:54:42 INFO s3native.NativeS3FileSystem:
> OutputStream for key
> 'some_table/_temporary/_attempt_local555455791_0001_m_000000_0/part-m-00000'
> closed. Now beginning upload
> [hostaddress] out: 13/04/02 01:54:42 INFO mapred.LocalJobRunner:
> [hostaddress] out: 13/04/02 01:54:45 INFO mapred.LocalJobRunner:
> [hostaddress] out: 13/04/02 01:55:31 INFO s3native.NativeS3FileSystem:
> OutputStream for key
> 'some_table/_temporary/_attempt_local555455791_0001_m_000000_0/part-m-00000'
> upload complete
> [hostaddress] out: 13/04/02 01:55:31 INFO mapred.Task:
> Task:attempt_local555455791_0001_m_000000_0 is done. And is in the process
> of commiting
> [hostaddress] out: 13/04/02 01:55:31 INFO mapred.LocalJobRunner:
> [hostaddress] out: 13/04/02 01:55:31 INFO mapred.Task: Task
> attempt_local555455791_0001_m_000000_0 is allowed to commit now
> [hostaddress] out: 13/04/02 01:55:36 INFO mapred.LocalJobRunner:
> [hostaddress] out: 13/04/02 01:56:03 WARN output.FileOutputCommitter:
> Failed to delete the temporary output directory of task:
> attempt_local555455791_0001_m_000000_0 - s3n://secret@bucketsomewhere
> /some_table/_temporary/_attempt_local555455791_0001_m_000000_0
> [hostaddress] out: 13/04/02 01:56:03 INFO output.FileOutputCommitter: Saved
+
Christian Prokopp 2013-04-10, 11:20