Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Custom DB Loader UDF

Copy link to this message
Custom DB Loader UDF
Hi all,

I know this question has probably been posed multiple times, but I'm having difficulty figuring out a couple of aspects of a custom LoaderFunc to read from a DB. And yes, I did try to Google my way to an answer. Anyhoo, for what it's worth, I have a MySql table that I wish to load via Pig. I have the LoaderFunc working using PigServer in a Java app, but I noticed the following when my job gets submitted to my MR cluster. I generated 6 InputSplits in my custom InputFormat, where each split specifies a non-overlapping range/page of records to read from. I thought that each InputSplit would correspond to a map task, but what I see in the JobTracker is that the submitted job only has 1 map task which executes each split serially. Is my understanding even correct that a split can be effectively assigned to a single map task? If so, can I coerce the submitted MR job to properly get each of my splits to execute in its own map task?