Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> issue with DBInputFormat


Copy link to this message
-
issue with DBInputFormat
Hi,

When using DBInputFormat to unload a data from table to hdfs i have
configured 6 map tasks to execute but 0th map task alone unloading the
whole data from table and the remaining 5 tasks were running properly.
Please find my obeservtion on debugging.

Chunk size=855565

Input Splits:

For split0 the start=0 and the end=855565 and the length=855565
For split1 the start=855565 and the end=1711130 and the length=855565
For split2 the start=1711130 and the end=2566695 and the length=855565
For split3 the start=2566695 and the end=3422260 and the length=855565
For split4 the start=3422260 and the end=4277825 and the length=855565
For split5 the start=4277825 and the end=5133394 and the length=855569

Queries fired from individual map tasks based on the splits created:

Map task 0: Select query: select * from emp
Map task 1: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM (
select * from emp ) a WHERE rownum <= 4277825 + 855569 ) WHERE dbif_rno >=
4277825
Map task 2: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM (
select * from emp ) a WHERE rownum <= 855565 + 855565 ) WHERE dbif_rno >=
855565
Map task 3: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM (
select * from emp ) a WHERE rownum <= 1711130 + 855565 ) WHERE dbif_rno >=
1711130
Map task 4: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM (
select * from emp ) a WHERE rownum <= 2566695 + 855565 ) WHERE dbif_rno >=
2566695
Map task 5: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM (
select * from emp ) a WHERE rownum <= 3422260 + 855565 ) WHERE dbif_rno >=
3422260

The query executed from Map task 0 is the problem creator is not having any
limits so it queried all the rows from that task.

The below condition
in org.apache.hadoop.mapreduce.lib.db.OracleDBRecordReader.getSelectQuery()

if (split.getLength() > 0 && *split.getStart() > 0*) {
...
...}

should be as
if (split.getLength() > 0 && *split.getStart() >= 0*) {
...
...}
By overriding the getSelectQuery i could able to overcome the issue.
Anybody faced similar issue?
Cheers!
Manoj.