Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop >> mail # dev >> Review Request: SQOOP-603: Support small intervals in IntegerSplitter implementation


+
Jarek Cecho 2012-09-20, 15:37
+
Cheolsoo Park 2012-09-20, 18:21
+
Cheolsoo Park 2012-09-20, 16:35
Copy link to this message
-
Re: Review Request: SQOOP-603: Support small intervals in IntegerSplitter implementation


> On Sept. 20, 2012, 4:35 p.m., Cheolsoo Park wrote:
> > Hi Jarcec,
> >
> > What if min = 0, max = 1, and numSplits = 5?
> >
> > Following the split() function,
> >
> > splitSize = (1 - 0) / 5 = 0;
> > remainder = (1 - 0) % 5 = 1;
> >
> > After the for loop,
> >
> > splits = (0, 1)
> >
> > Now (maxVal - minVal) <= numSplits is true as (1 - 0) <= 5,
> >
> > so we add maxVal to splits.
> >
> > splits = (0, 1, 1)
> >
> > so we end up with splits as follows:
> >
> > [0, 1)
> > [1, 1) => redundant split that includes no values
> > [1, 1]
> >
> > This case can happen if the user sets -m to a unnecessarily large number, can't it?
> >
> > Please correct me if I am wrong.
> >
> > Thanks!

Hi sir,
you're right about output of the split function - it will be (0, 1, 1). However IntegerSplitter will convert this list to following two splits:

* 0 <= x < 1
* 1 <= x <= 1

IntegerSplitter is always creating n - 1 splits based on list provided by split() method that is in question. I've actually tested this scenario on real MySQL when I had only 5 values in target table and I've requested 20 mappers - i did not end up with data duplicity.

Jarcec
- Jarek
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7193/#review11735
-----------------------------------------------------------
On Sept. 20, 2012, 3:37 p.m., Jarek Cecho wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7193/
> -----------------------------------------------------------
>
> (Updated Sept. 20, 2012, 3:37 p.m.)
>
>
> Review request for Sqoop.
>
>
> Description
> -------
>
> I've decided to alter method split() to add one maxVal in case that there is less or equal split points then requested split count.
>
>
> This addresses bug SQOOP-603.
>     https://issues.apache.org/jira/browse/SQOOP-603
>
>
> Diffs
> -----
>
>   src/java/org/apache/sqoop/mapreduce/db/DataDrivenDBInputFormat.java 35b74eb
>   src/java/org/apache/sqoop/mapreduce/db/IntegerSplitter.java 8e7a096
>   src/test/org/apache/sqoop/mapreduce/db/TestIntegerSplitter.java 22d5140
>
> Diff: https://reviews.apache.org/r/7193/diff/
>
>
> Testing
> -------
>
> * ant test
> * Real MySQL instance in couple of scenarios
>
>
> Thanks,
>
> Jarek Cecho
>
>