Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop >> mail # dev >> Review Request: Sqoop2: Export sub directories option


+
rj.vasanthkumar@... 2013-06-01, 12:17
+
Jarek Cecho 2013-06-02, 19:13
+
rj.vasanthkumar@... 2013-06-04, 11:58
Copy link to this message
-
Re: Review Request: Sqoop2: Export sub directories option

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11588/#review21503
-----------------------------------------------------------
Hi Vasanth,
thank you very much for working on this! I do have couple of additional notes:

I do have a small concerns about the test case TestHdfsExtract that seems to be written in a way that the context objects are not passed into the job conf. I think that this is artifact from early days that we should fix in a nice manner as is very likely that we will have to add more inputs in the future.
core/src/main/java/org/apache/sqoop/framework/configuration/InputForm.java
<https://reviews.apache.org/r/11588/#comment44503>

    It seems to me that using Enum for simple boolean input is unnecessary overhead. We are adding support for Boolen model type in SQOOP-1076, but the patch is not yet committed. Do you think that it would make sense to wait and use that instead of the Enum?

execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportPartitioner.java
<https://reviews.apache.org/r/11588/#comment44504>

    It seems that boolean is more appropriate data type for this variable.

execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportPartitioner.java
<https://reviews.apache.org/r/11588/#comment44694>

    It seems that the code here will firstly recursively traverse through the file system to find all directories, store them in memory and then iterate over the stored directories to get list of files. Wouldn't be faster and less memory intensive to do both passes at the same time? E.g. traverse though the filesystem and build directly the list of files?

execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportPartitioner.java
<https://reviews.apache.org/r/11588/#comment44695>

    Similarly as above, would be simple and more effective to calculate the size of all files directly rather than building list of all directories in memory first?
Jarcec

- Jarek Cecho
On June 4, 2013, 11:58 a.m., vasanthkumar wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11588/
> -----------------------------------------------------------
>
> (Updated June 4, 2013, 11:58 a.m.)
>
>
> Review request for Sqoop.
>
>
> Description
> -------
>
>
> Introducing new option for exporting the sub directories of the given input directory.
>
> Added new input
> Input configuration
>
> Input directory: /input
> Recursive:
>   0 : NO
>   1 : YES
> Choose: 0
>
>
> This addresses bug SQOOP-1063.
>     https://issues.apache.org/jira/browse/SQOOP-1063
>
>
> Diffs
> -----
>
>   core/src/main/java/org/apache/sqoop/framework/configuration/InputForm.java d5cbeec
>   core/src/main/resources/framework-resources.properties cebc90e
>   docs/src/site/sphinx/ClientAPI.rst 4f3fda6
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportPartitioner.java b3590dc
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java b3e6050
>   submission/mapreduce/src/main/java/org/apache/sqoop/submission/mapreduce/MapreduceSubmissionEngine.java 0e8c9f7
>
> Diff: https://reviews.apache.org/r/11588/diff/
>
>
> Testing
> -------
>
> Done
>
>
> Thanks,
>
> vasanthkumar
>
>