Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Re: UDF and rdbms lookups


Copy link to this message
-
Re: UDF and rdbms lookups
Ashutosh Chauhan 2010-07-01, 18:03
That will be a day of rejoice when a multi-million Oracle deployment
comes to a grinding halt by tiny-weeny 4 line pig script. *wink* ;)

Ashutosh
On Thu, Jul 1, 2010 at 10:52, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> Can you put a LOG.info and javadoc into this patch saying "watch out, DB
> connection bomb being deployed"? :)
>
> On Thu, Jul 1, 2010 at 10:48 AM, Ashutosh Chauhan <
> [EMAIL PROTECTED]> wrote:
>
>> There is an uncommitted Piggybank UDF which may help you.
>> https://issues.apache.org/jira/browse/PIG-1229 You can try the first
>> patch (         pig-1229.2.patch by Ankur ) listed on the page It does a
>> different thing of writing rows from Pig into the DB. But DB
>> connection part you can borrow from it.
>>
>> Note to self: I really want to get this patch committed before more
>> people reinvent the wheel of making Pig talk to DB.
>>
>> On Thu, Jul 1, 2010 at 09:48, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
>> > Also -- I hope your cluster is not too big. It's really easy to DDOS your
>> > database using hadoop.
>> >
>> > On Thu, Jul 1, 2010 at 9:47 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]>
>> wrote:
>> >
>> >> The simplest thing you can do is to have database handle at the object
>> >> level, set it to null, and just initialize it in eval() if you see that
>> it's
>> >> null.
>> >> You can also init the connection in the constructor.
>> >> A static dbh will let you share it across tasks, if you persist the jvm.
>> >> Naturally you will want to throw in some code to handle dropped
>> connections
>> >> and all that.
>> >>
>> >>
>> >>
>> >> On Thu, Jul 1, 2010 at 9:01 AM, Dave Viner <[EMAIL PROTECTED]> wrote:
>> >>
>> >>> In a custom UDF, what's the most appropriate way to initialize and
>> connect
>> >>> to a old-fashioned rdbms?
>> >>>
>> >>> I wrote a simple UDF which opens/closes a connection on each exec(),
>> but
>> >>> this feels a bit like overkill.  Is there an "init()" method that is
>> >>> invoked
>> >>> in a UDF to help with one-time initialization (like a database
>> connection
>> >>> or
>> >>> sql query preparation)?
>> >>>
>> >>> Thanks
>> >>> Dave Viner
>> >>>
>> >>
>> >>
>> >
>>
>