Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Re: UDF and rdbms lookups

Copy link to this message
Re: UDF and rdbms lookups
There is an uncommitted Piggybank UDF which may help you.
https://issues.apache.org/jira/browse/PIG-1229 You can try the first
patch ( pig-1229.2.patch by Ankur ) listed on the page It does a
different thing of writing rows from Pig into the DB. But DB
connection part you can borrow from it.

Note to self: I really want to get this patch committed before more
people reinvent the wheel of making Pig talk to DB.

On Thu, Jul 1, 2010 at 09:48, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> Also -- I hope your cluster is not too big. It's really easy to DDOS your
> database using hadoop.
> On Thu, Jul 1, 2010 at 9:47 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
>> The simplest thing you can do is to have database handle at the object
>> level, set it to null, and just initialize it in eval() if you see that it's
>> null.
>> You can also init the connection in the constructor.
>> A static dbh will let you share it across tasks, if you persist the jvm.
>> Naturally you will want to throw in some code to handle dropped connections
>> and all that.
>> On Thu, Jul 1, 2010 at 9:01 AM, Dave Viner <[EMAIL PROTECTED]> wrote:
>>> In a custom UDF, what's the most appropriate way to initialize and connect
>>> to a old-fashioned rdbms?
>>> I wrote a simple UDF which opens/closes a connection on each exec(), but
>>> this feels a bit like overkill.  Is there an "init()" method that is
>>> invoked
>>> in a UDF to help with one-time initialization (like a database connection
>>> or
>>> sql query preparation)?
>>> Thanks
>>> Dave Viner