Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Custom DB Loader UDF

Copy link to this message
Re: Custom DB Loader UDF
Hi Terry,

I am not sure whether you architecture is correct, but what we do in
my team: we override setLocation in LoadFunc so that it caches db data
to hdfs.
Basically the simplest way is to copy data from MySQL to HDFS by Sqoop
and then read it by Pig as a normal input.


On Sat, Sep 1, 2012 at 1:02 AM, Terry Siu <[EMAIL PROTECTED]> wrote:
> Hi all,
> I know this question has probably been posed multiple times, but I'm having difficulty figuring out a couple of aspects of a custom LoaderFunc to read from a DB. And yes, I did try to Google my way to an answer. Anyhoo, for what it's worth, I have a MySql table that I wish to load via Pig. I have the LoaderFunc working using PigServer in a Java app, but I noticed the following when my job gets submitted to my MR cluster. I generated 6 InputSplits in my custom InputFormat, where each split specifies a non-overlapping range/page of records to read from. I thought that each InputSplit would correspond to a map task, but what I see in the JobTracker is that the submitted job only has 1 map task which executes each split serially. Is my understanding even correct that a split can be effectively assigned to a single map task? If so, can I coerce the submitted MR job to properly get each of my splits to execute in its own map task?
> Thanks,
> -Terry

Best Regards,
Ruslan Al-Fakikh