Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Custom DB Loader UDF

Copy link to this message
Custom DB Loader UDF
Terry Siu 2012-08-31, 21:02
Hi all,

I know this question has probably been posed multiple times, but I'm having difficulty figuring out a couple of aspects of a custom LoaderFunc to read from a DB. And yes, I did try to Google my way to an answer. Anyhoo, for what it's worth, I have a MySql table that I wish to load via Pig. I have the LoaderFunc working using PigServer in a Java app, but I noticed the following when my job gets submitted to my MR cluster. I generated 6 InputSplits in my custom InputFormat, where each split specifies a non-overlapping range/page of records to read from. I thought that each InputSplit would correspond to a map task, but what I see in the JobTracker is that the submitted job only has 1 map task which executes each split serially. Is my understanding even correct that a split can be effectively assigned to a single map task? If so, can I coerce the submitted MR job to properly get each of my splits to execute in its own map task?

Ruslan Al-Fakikh 2012-08-31, 21:44
Terry Siu 2012-08-31, 22:01
Russell Jurney 2012-08-31, 22:02
Terry Siu 2012-08-31, 22:12
Russell Jurney 2012-08-31, 23:03
Ruslan Al-Fakikh 2012-08-31, 23:50
Russell Jurney 2012-09-01, 00:09
Dmitriy Ryaboy 2012-09-02, 21:17
Ruslan Al-Fakikh 2012-08-31, 22:55