Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> issue with DBInputFormat


Copy link to this message
-
issue with DBInputFormat
Hi,

When using DBInputFormat to unload a data from table to hdfs i have
configured 6 map tasks to execute but 0th map task alone unloading the
whole data from table and the remaining 5 tasks were running properly.
Please find my obeservtion on debugging.

Chunk size=855565

Input Splits:

For split0 the start=0 and the end=855565 and the length=855565
For split1 the start=855565 and the end=1711130 and the length=855565
For split2 the start=1711130 and the end=2566695 and the length=855565
For split3 the start=2566695 and the end=3422260 and the length=855565
For split4 the start=3422260 and the end=4277825 and the length=855565
For split5 the start=4277825 and the end=5133394 and the length=855569

Queries fired from individual map tasks based on the splits created:

Map task 0: Select query: select * from emp
Map task 1: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM (
select * from emp ) a WHERE rownum <= 4277825 + 855569 ) WHERE dbif_rno >=
4277825
Map task 2: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM (
select * from emp ) a WHERE rownum <= 855565 + 855565 ) WHERE dbif_rno >=
855565
Map task 3: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM (
select * from emp ) a WHERE rownum <= 1711130 + 855565 ) WHERE dbif_rno >=
1711130
Map task 4: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM (
select * from emp ) a WHERE rownum <= 2566695 + 855565 ) WHERE dbif_rno >=
2566695
Map task 5: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM (
select * from emp ) a WHERE rownum <= 3422260 + 855565 ) WHERE dbif_rno >=
3422260

The query executed from Map task 0 is the problem creator is not having any
limits so it queried all the rows from that task.

The below condition
in org.apache.hadoop.mapreduce.lib.db.OracleDBRecordReader.getSelectQuery()

if (split.getLength() > 0 && *split.getStart() > 0*) {
...
...}

should be as
if (split.getLength() > 0 && *split.getStart() >= 0*) {
...
...}
By overriding the getSelectQuery i could able to overcome the issue.
Anybody faced similar issue?
Cheers!
Manoj.

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB