Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Strange distribution of keys among mappers


Copy link to this message
-
Re: Strange distribution of keys among mappers
Hey David,

Here's the algorithm:
Split lengths are defined by (max - min)/(# mappers) and whatever is left
is tacked on at the end. So in this case, (288272191-2110)/3 96090027.33... So I'm assuming the .33... is rounded down and split lengths
will be of length 96090027. Sqoop will then create splits with the
following points: (min) + (range length)*(n). We can see that 2110 + 96090027*0
= 2110, 2110 + 96090027*1 = 96092137, 2110 + 96090027*2 = 192182164, and 2110
+ 96090027*3 = 288272191 will be generated based off of this algorithm. The
last point to be added will be 288272192 because the max value is not part
of the generated split points. Then sqoop will distributed accordingly
based off of these points as you've pointed out above.

Just to be sure, did you configure sqoop to use 3 mappers?

Hope this helps,
-Abe
On Wed, Jun 19, 2013 at 8:33 AM, David Kincaid <[EMAIL PROTECTED]>wrote:

> We're seeing a strange thing happen with a sqoop import job with the way
> the key range is getting distributed among the 4 mappers that are running.
> The minimum key value is 2110 and the maximum value is 288272191. We are
> getting one mapper that is only getting one record to import. Here is the
> distribution among the mappers:
>
> [2110, 96092137)
> [96092137, 192182164)
> [192182164, 288272191)
> [288272191, 288272192)
>
> you can see that the fourth mapper is given a range with only one value in
> it. Could someone help me understand what is going on?
>
> Thanks,
>
> Dave
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB