Charles Earl 2012-12-18, 17:26
unfortunately non-trivial joins might lead to an unexpected results and issues. One caveat is that Sqoop will run your expensive query in parallel which might lead to undesirable performance hit on the database side. One way how to overcome this issue is to run your expensive non-trivial query prior Sqoop import and store it's output as an table, for example in MySQL you can do
CREATE TABLE sqoop_tmp_table AS SELECT ... JOIN ... JOIN ... JOIN ... JOIN ... JOIN ... (query that you've used originally)
On Tue, Dec 18, 2012 at 12:26:06PM -0500, Charles Earl wrote:
> Are there any best practices or caveats for including nested joins in free from query imports?
> I have noted that in the documentation it says "Use of complex queries such as queries that have sub-queries or joins leading to ambiguous projections can lead to unexpected results." I'm relatively new to the use of sqoop, have not encountered any problems, but I imagine that multiple mapper imports combine with complex joins might produce inconsistent results, as it seems that the parallelism depends upon range partitioning based on the splitting column. Or perhaps this is over thinking….