Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop, mail # dev - Review Request: Fix for SQOOP-1029


Copy link to this message
-
Re: Review Request: Fix for SQOOP-1013
Abraham Elmahrek 2013-05-31, 20:30

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11537/#review21237
-----------------------------------------------------------
Seems good except for a few comments and nit picks.

The datetime split logic is when certain intervals are "hotter" than others. IE: 1000000 rows out of 20000000 exist between the date range of december 1st to december 31st, but a user is importing the entire years data, with 12 node. Basically, 1 machine will extract 1000000 rows, while the others will extract 1000000/11 rows. In the future we could probably add some upfront analysis of the data to improve distribution.
connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportPartitioner.java
<https://reviews.apache.org/r/11537/#comment44096>

    Nit: This loop can probably handle all partitions by using constructDateConditions(objLB, objUB, false) to constructDateConditions(objLB, objUB, upperBound == maxDateValue). Also, upperBound += (i < remainder) ? 1 : 0;

connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportPartitioner.java
<https://reviews.apache.org/r/11537/#comment44094>

    Nit: This seems like it can be merged above.

connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportPartitioner.java
<https://reviews.apache.org/r/11537/#comment44099>

    There doesn't seem to be any thing stopping partitionMaxValue from being null. So, it makes sense to add some null checking here as well.

connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportPartitioner.java
<https://reviews.apache.org/r/11537/#comment44098>

    There doesn't seem to be any thing stopping partitionMaxValue from being null. So, it makes sense to add some null checking here as well.
- Abraham Elmahrek
On May 30, 2013, 6:25 p.m., Venkat Ranganathan wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11537/
> -----------------------------------------------------------
>
> (Updated May 30, 2013, 6:25 p.m.)
>
>
> Review request for Sqoop.
>
>
> Description
> -------
>
> This addresses Boolean, date, time, and timestamp splitters.
>
> THis also disallows char type splitters as discussed in SQOOP-976
>
>
> Diffs
> -----
>
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportPartitioner.java f80f30d
>   connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportPartitioner.java ee314d0
>
> Diff: https://reviews.apache.org/r/11537/diff/
>
>
> Testing
> -------
>
> Introduced new unit tests to test new functionality
> All tests pass
>
>
> Thanks,
>
> Venkat Ranganathan
>
>