Just to add on to my response. If you need to fetch data from
rdbms on your mapper using your custom mapreduce code you can use the
DBInputFormat in your mapper class with MultipleInputs. You have to be
careful in using the number of mappers for your application as dbs would be
constrained with a limit on maximum simultaneous connections. Also you need
to ensure that that the same Query is not executed n number of times in n
mappers all fetching the same data, It'd be just wastage of network. Sqoop
+ Hive would be my recommendation and a good combination for such use
cases. If you have Pig competency you can also look into pig instead of
Hope it helps!...
On Tue, Dec 6, 2011 at 1:36 AM, Bejoy Ks <[EMAIL PROTECTED]> wrote:
> If I get your requirement right you need to get in data from
> multiple rdbms sources and do a join on the same, also may be some more
> custom operations on top of this. For this you don't need to go in for
> writing your custom mapreduce code unless it is that required. You can
> achieve the same in two easy steps
> - Import data from RDBMS into Hive using SQOOP (Import)
> - Use hive to do some join and processing on this data
> Hope it helps!..
> On Tue, Dec 6, 2011 at 12:13 AM, Justin Vincent <[EMAIL PROTECTED]>wrote:
>> I would like join some db tables, possibly from different databases, in a
>> MR job.
>> I would essentially like to use MultipleInputs, but that seems file
>> oriented. I need a different mapper for each db table.
>> Justin Vincent