Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop, mail # user - /tmp dir for import configurable?


+
Christian Prokopp 2013-03-28, 15:35
+
Alexander Alten-Lorenz 2013-03-28, 15:50
+
Christian Prokopp 2013-03-28, 15:54
+
Jarek Jarcec Cecho 2013-03-28, 21:49
+
Christian Prokopp 2013-04-02, 10:38
+
Jarek Jarcec Cecho 2013-04-06, 05:05
Copy link to this message
-
Re: /tmp dir for import configurable?
Christian Prokopp 2013-04-10, 11:20
Hi Jarcec,

Perfect solution. Thank you very much!

Cheers,
Christian
On Sat, Apr 6, 2013 at 6:05 AM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote:

> Hi Christian,
> thank you very much for sharing the log and please accept my apologies for
> late response.
>
> Closely looking into your exception, I can confirm that it's the S3 file
> system that is creating the files in /tmp and not Sqoop itself.
>
> > [hostaddress] out: 13/04/02 01:56:03 INFO s3native.NativeS3FileSystem:
> > OutputStream for key 'some_table/_SUCCESS' writing to tempfile '*
> > /tmp/hadoop-jenkins/s3/output-1400873345908825433.tmp*'
>
> Taking a brief look into the source code [1], it seems that it's the
> method newBackupFile() defined on line 195 that is responsible for creating
> the temporary file. And also it seems that it's behaviour can be altered
> using fs.s3.buffer.dir property. Would you mind to try use it in your Sqoop
> execution?
>
>   sqoop import -Dfs.s3.buffer.dir=/custom/path ...
>
> I've also noticed that you're using the LocalJobRunner which is suggesting
> Sqoop is executing all jobs locally on your machine and not on your Hadoop
> cluster. I would recommend checking Hadoop configuration in case that your
> intention is to run your data transfer in parallel.
>
> Jarcec
>
> Links:
> 1:
> http://hadoop.apache.org/docs/r2.0.3-alpha/api/src-html/org/apache/hadoop/fs/s3native/NativeS3FileSystem.html
>
> On Tue, Apr 02, 2013 at 11:38:35AM +0100, Christian Prokopp wrote:
> > Hi Jarcec,
> >
> > I am running the command on the CLI of a cluster node. It appears to run
> a
> > local MR job writing the results to /tmp before sending it to S3:
> >
> > [..]
> > [hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper:
> > Beginning mysqldump fast path import
> > [hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper:
> > Performing import of table image from database some_db
> > [hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper:
> > Converting data to use specified delimiters.
> > [hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper: (For
> > the fastest possible import, use
> > [hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper:
> > --mysql-delimiters to specify the same field
> > [hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper:
> > delimiters as are used by mysqldump.)
> > [hostaddress] out: 13/04/02 01:52:54 INFO mapred.LocalJobRunner:
> > [hostaddress] out: 13/04/02 01:52:55 INFO mapred.JobClient:  map 100%
> > reduce 0%
> > [hostaddress] out: 13/04/02 01:52:57 INFO mapred.LocalJobRunner:
> > [..]
> > [hostaddress] out: 13/04/02 01:53:03 INFO mapred.LocalJobRunner:
> > [hostaddress] out: 13/04/02 01:54:42 INFO mapreduce.MySQLDumpMapper:
> > Transfer loop complete.
> > [hostaddress] out: 13/04/02 01:54:42 INFO mapreduce.MySQLDumpMapper:
> > Transferred 668.9657 MB in 113.0105 seconds (5.9195 MB/sec)
> > [hostaddress] out: 13/04/02 01:54:42 INFO mapred.LocalJobRunner:
> > [hostaddress] out: 13/04/02 01:54:42 INFO s3native.NativeS3FileSystem:
> > OutputStream for key
> >
> 'some_table/_temporary/_attempt_local555455791_0001_m_000000_0/part-m-00000'
> > closed. Now beginning upload
> > [hostaddress] out: 13/04/02 01:54:42 INFO mapred.LocalJobRunner:
> > [hostaddress] out: 13/04/02 01:54:45 INFO mapred.LocalJobRunner:
> > [hostaddress] out: 13/04/02 01:55:31 INFO s3native.NativeS3FileSystem:
> > OutputStream for key
> >
> 'some_table/_temporary/_attempt_local555455791_0001_m_000000_0/part-m-00000'
> > upload complete
> > [hostaddress] out: 13/04/02 01:55:31 INFO mapred.Task:
> > Task:attempt_local555455791_0001_m_000000_0 is done. And is in the
> process
> > of commiting
> > [hostaddress] out: 13/04/02 01:55:31 INFO mapred.LocalJobRunner:
> > [hostaddress] out: 13/04/02 01:55:31 INFO mapred.Task: Task
> > attempt_local555455791_0001_m_000000_0 is allowed to commit now
> > [hostaddress] out: 13/04/02 01:55:36 INFO mapred.LocalJobRunner:
> > [hostaddress] out: 13/04/02 01:56:03 WARN output.FileOutputCommitter:

Best regards,

*Christian Prokopp*
Data Scientist, PhD
Rangespan Ltd. <http://www.rangespan.com/>