|
|
-
Best practice for DB connection
Mark Kerzner 2012-03-07, 01:02
Hi,
I need to initialize the HBase connection, which I normally do in configure() in the Mapper, and then my mapper uses it. How do I do it in Pig?
I am ready to define a UDF that will return a handle, but is it a best practice?
Thank you, Mark
-
Re: Best practice for DB connection
Bill Graham 2012-03-07, 01:14
Have you checked out HBaseStorage? http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.htmlOn Tue, Mar 6, 2012 at 5:02 PM, Mark Kerzner <[EMAIL PROTECTED]>wrote: > Hi, > > I need to initialize the HBase connection, which I normally do in > configure() in the Mapper, and then my mapper uses it. How do I do it in > Pig? > > I am ready to define a UDF that will return a handle, but is it a best > practice? > > Thank you, > Mark > -- *Note that I'm no longer using my Yahoo! email address. Please email me at [EMAIL PROTECTED] going forward.*
-
Re: Best practice for DB connection
Mark Kerzner 2012-03-07, 02:30
I see, Bill, thank you. But I think I need something different. I am processing line after line, and for some elements I extract from each line, I am doing HBase lookups. So I need an open connection to stay open during the life of a mapper. Thank you, Mark On Tue, Mar 6, 2012 at 7:14 PM, Bill Graham <[EMAIL PROTECTED]> wrote: > Have you checked out HBaseStorage? > > > http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html> > On Tue, Mar 6, 2012 at 5:02 PM, Mark Kerzner <[EMAIL PROTECTED] > >wrote: > > > Hi, > > > > I need to initialize the HBase connection, which I normally do in > > configure() in the Mapper, and then my mapper uses it. How do I do it in > > Pig? > > > > I am ready to define a UDF that will return a handle, but is it a best > > practice? > > > > Thank you, > > Mark > > > > > > -- > *Note that I'm no longer using my Yahoo! email address. Please email me at > [EMAIL PROTECTED] going forward.* >
-
Re: Best practice for DB connection
Raghu Angadi 2012-03-07, 08:27
On Tue, Mar 6, 2012 at 5:02 PM, Mark Kerzner <[EMAIL PROTECTED]>wrote:
> Hi, > > I need to initialize the HBase connection, which I normally do in > configure() in the Mapper, and then my mapper uses it. How do I do it in > Pig? > > I am ready to define a UDF that will return a handle, but is it a best > practice? >
yes. you can initialize inside the first call to UDF.exec(). The same UDF object is used for the entire mapper.
Don't initialize inside the constructor for UDF. AFIK there is no way to tell how many times and when the constructor is called (though it is no more than a handful of times on the front end).
Raghu.
> Thank you, > Mark >
-
Re: Best practice for DB connection
Norbert Burger 2012-03-07, 14:38
Out of curiosity, is there an equivalent to .exec() for Python UDFs? We had the same issue recently.
Norbert
On Wed, Mar 7, 2012 at 3:27 AM, Raghu Angadi <[EMAIL PROTECTED]> wrote:
> On Tue, Mar 6, 2012 at 5:02 PM, Mark Kerzner <[EMAIL PROTECTED] > >wrote: > > > Hi, > > > > I need to initialize the HBase connection, which I normally do in > > configure() in the Mapper, and then my mapper uses it. How do I do it in > > Pig? > > > > I am ready to define a UDF that will return a handle, but is it a best > > practice? > > > > yes. you can initialize inside the first call to UDF.exec(). The same UDF > object is used for the entire mapper. > > Don't initialize inside the constructor for UDF. AFIK there is no way to > tell how many times and when the constructor is called (though it is no > more than a handful of times on the front end). > > Raghu. > > > Thank you, > > Mark > > >
-
Re: Best practice for DB connection
Alan Gates 2012-03-07, 17:43
The Python UDF itself is the equivalent of exec(). There's no constructor for Python UDFs, since they are just a function rather than a class.
Alan.
On Mar 7, 2012, at 6:38 AM, Norbert Burger wrote:
> Out of curiosity, is there an equivalent to .exec() for Python UDFs? We > had the same issue recently. > > Norbert > > On Wed, Mar 7, 2012 at 3:27 AM, Raghu Angadi <[EMAIL PROTECTED]> wrote: > >> On Tue, Mar 6, 2012 at 5:02 PM, Mark Kerzner <[EMAIL PROTECTED] >>> wrote: >> >>> Hi, >>> >>> I need to initialize the HBase connection, which I normally do in >>> configure() in the Mapper, and then my mapper uses it. How do I do it in >>> Pig? >>> >>> I am ready to define a UDF that will return a handle, but is it a best >>> practice? >>> >> >> yes. you can initialize inside the first call to UDF.exec(). The same UDF >> object is used for the entire mapper. >> >> Don't initialize inside the constructor for UDF. AFIK there is no way to >> tell how many times and when the constructor is called (though it is no >> more than a handful of times on the front end). >> >> Raghu. >> >>> Thank you, >>> Mark >>> >>
-
Re: Best practice for DB connection
Mark Kerzner 2012-03-07, 17:49
Raghu,
it almost works. It just gives me ZooKeeper exception, probably because Pig environment does not know enough about HBase, so that
Configuration hConf = HBaseConfiguration.create();
does not have all that it needs for the HBase connection. Here is my code snippet
Configuration hConf = HBaseConfiguration.create(); hConf.set(HBASE_CONFIGURATION_ZOOKEEPER_QUORUM, zookeeperUrl); hConf.set(HBASE_CONFIGURATION_ZOOKEEPER_CLIENTPORT, zookeeperPort); HTable hTable = new HTable(hConf, tableName);
Thank you, Mark
On Wed, Mar 7, 2012 at 2:27 AM, Raghu Angadi <[EMAIL PROTECTED]> wrote:
> On Tue, Mar 6, 2012 at 5:02 PM, Mark Kerzner <[EMAIL PROTECTED] > >wrote: > > > Hi, > > > > I need to initialize the HBase connection, which I normally do in > > configure() in the Mapper, and then my mapper uses it. How do I do it in > > Pig? > > > > I am ready to define a UDF that will return a handle, but is it a best > > practice? > > > > yes. you can initialize inside the first call to UDF.exec(). The same UDF > object is used for the entire mapper. > > Don't initialize inside the constructor for UDF. AFIK there is no way to > tell how many times and when the constructor is called (though it is no > more than a handful of times on the front end). > > Raghu. > > > Thank you, > > Mark > > >
-
Re: Best practice for DB connection
Dmitriy Ryaboy 2012-03-09, 02:26
Yeah... check out what we do in HBaseStorage to pass the config around. It's a bit gnarly.
D
On Wed, Mar 7, 2012 at 9:49 AM, Mark Kerzner <[EMAIL PROTECTED]> wrote: > Raghu, > > it almost works. It just gives me ZooKeeper exception, probably because Pig > environment does not know enough about HBase, so that > > Configuration hConf = HBaseConfiguration.create(); > > does not have all that it needs for the HBase connection. Here is my code > snippet > > Configuration hConf = HBaseConfiguration.create(); > hConf.set(HBASE_CONFIGURATION_ZOOKEEPER_QUORUM, zookeeperUrl); > hConf.set(HBASE_CONFIGURATION_ZOOKEEPER_CLIENTPORT, zookeeperPort); > HTable hTable = new HTable(hConf, tableName); > > Thank you, > Mark > > On Wed, Mar 7, 2012 at 2:27 AM, Raghu Angadi <[EMAIL PROTECTED]> wrote: > >> On Tue, Mar 6, 2012 at 5:02 PM, Mark Kerzner <[EMAIL PROTECTED] >> >wrote: >> >> > Hi, >> > >> > I need to initialize the HBase connection, which I normally do in >> > configure() in the Mapper, and then my mapper uses it. How do I do it in >> > Pig? >> > >> > I am ready to define a UDF that will return a handle, but is it a best >> > practice? >> > >> >> yes. you can initialize inside the first call to UDF.exec(). The same UDF >> object is used for the entire mapper. >> >> Don't initialize inside the constructor for UDF. AFIK there is no way to >> tell how many times and when the constructor is called (though it is no >> more than a handful of times on the front end). >> >> Raghu. >> >> > Thank you, >> > Mark >> > >>
-
Re: Best practice for DB connection
Mark Kerzner 2012-03-09, 02:30
Thank you, Dmitriy, indeed, it was so gnarly that I used an MR job instead :)
Mark
On Thu, Mar 8, 2012 at 8:26 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> Yeah... check out what we do in HBaseStorage to pass the config > around. It's a bit gnarly. > > D > > On Wed, Mar 7, 2012 at 9:49 AM, Mark Kerzner <[EMAIL PROTECTED]> > wrote: > > Raghu, > > > > it almost works. It just gives me ZooKeeper exception, probably because > Pig > > environment does not know enough about HBase, so that > > > > Configuration hConf = HBaseConfiguration.create(); > > > > does not have all that it needs for the HBase connection. Here is my code > > snippet > > > > Configuration hConf = HBaseConfiguration.create(); > > hConf.set(HBASE_CONFIGURATION_ZOOKEEPER_QUORUM, zookeeperUrl); > > hConf.set(HBASE_CONFIGURATION_ZOOKEEPER_CLIENTPORT, > zookeeperPort); > > HTable hTable = new HTable(hConf, tableName); > > > > Thank you, > > Mark > > > > On Wed, Mar 7, 2012 at 2:27 AM, Raghu Angadi <[EMAIL PROTECTED]> wrote: > > > >> On Tue, Mar 6, 2012 at 5:02 PM, Mark Kerzner <[EMAIL PROTECTED] > >> >wrote: > >> > >> > Hi, > >> > > >> > I need to initialize the HBase connection, which I normally do in > >> > configure() in the Mapper, and then my mapper uses it. How do I do it > in > >> > Pig? > >> > > >> > I am ready to define a UDF that will return a handle, but is it a best > >> > practice? > >> > > >> > >> yes. you can initialize inside the first call to UDF.exec(). The same > UDF > >> object is used for the entire mapper. > >> > >> Don't initialize inside the constructor for UDF. AFIK there is no way to > >> tell how many times and when the constructor is called (though it is no > >> more than a handful of times on the front end). > >> > >> Raghu. > >> > >> > Thank you, > >> > Mark > >> > > >> >
|
|