Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Re: UDF and rdbms lookups


Copy link to this message
-
Re: UDF and rdbms lookups
There is an uncommitted Piggybank UDF which may help you.
https://issues.apache.org/jira/browse/PIG-1229 You can try the first
patch ( pig-1229.2.patch by Ankur ) listed on the page It does a
different thing of writing rows from Pig into the DB. But DB
connection part you can borrow from it.

Note to self: I really want to get this patch committed before more
people reinvent the wheel of making Pig talk to DB.

On Thu, Jul 1, 2010 at 09:48, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> Also -- I hope your cluster is not too big. It's really easy to DDOS your
> database using hadoop.
>
> On Thu, Jul 1, 2010 at 9:47 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
>
>> The simplest thing you can do is to have database handle at the object
>> level, set it to null, and just initialize it in eval() if you see that it's
>> null.
>> You can also init the connection in the constructor.
>> A static dbh will let you share it across tasks, if you persist the jvm.
>> Naturally you will want to throw in some code to handle dropped connections
>> and all that.
>>
>>
>>
>> On Thu, Jul 1, 2010 at 9:01 AM, Dave Viner <[EMAIL PROTECTED]> wrote:
>>
>>> In a custom UDF, what's the most appropriate way to initialize and connect
>>> to a old-fashioned rdbms?
>>>
>>> I wrote a simple UDF which opens/closes a connection on each exec(), but
>>> this feels a bit like overkill.  Is there an "init()" method that is
>>> invoked
>>> in a UDF to help with one-time initialization (like a database connection
>>> or
>>> sql query preparation)?
>>>
>>> Thanks
>>> Dave Viner
>>>
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB