|
|
-
RE: Using Distributed Cache in Hive UDF's??Viraj Bhat 2010-06-24, 17:33
Hi Edward,
I was able to use the distributed cache, using the set mapred.cache.files option. I could read the files locally using standard java api's. Thanks Viraj ________________________________ From: Edward Capriolo [mailto:[EMAIL PROTECTED]] Sent: Tuesday, June 22, 2010 7:44 AM To: [EMAIL PROTECTED] Subject: Re: Using Distributed Cache in Hive UDF's?? Shameless plug. IF you put a file in the distributed cache it is in the working directory of the UDF so you do not need fancy hadoop isms to access it. Shameless plug: My geo-ip-udf does exactly this. http://www.jointhegrid.com/hive-udf-geo-ip-jtg/index.jsp http://www.jointhegrid.com/svn/hive-udf-geo-ip-jtg/ Edward On Mon, Jun 21, 2010 at 7:03 PM, Viraj Bhat <[EMAIL PROTECTED]> wrote: Hi all, I have a lookup function in hive which looks if a certain pattern is present in a large text file. I upload this text file to HDFS. I hope to use this text file in my UDF evaluate() method. Is there some documentation I can look up? Distributed Cache relies on lookupFiles = DistributedCache.getLocalCacheFiles(job); job is of type JobConf. Where do I get the JobConf object from within the UDF? Thanks Viraj |