Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Custom DB Loader UDF


Copy link to this message
-
Custom DB Loader UDF
Hi all,

I know this question has probably been posed multiple times, but I'm having difficulty figuring out a couple of aspects of a custom LoaderFunc to read from a DB. And yes, I did try to Google my way to an answer. Anyhoo, for what it's worth, I have a MySql table that I wish to load via Pig. I have the LoaderFunc working using PigServer in a Java app, but I noticed the following when my job gets submitted to my MR cluster. I generated 6 InputSplits in my custom InputFormat, where each split specifies a non-overlapping range/page of records to read from. I thought that each InputSplit would correspond to a map task, but what I see in the JobTracker is that the submitted job only has 1 map task which executes each split serially. Is my understanding even correct that a split can be effectively assigned to a single map task? If so, can I coerce the submitted MR job to properly get each of my splits to execute in its own map task?

Thanks,
-Terry
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB