Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Using Distributed Cache in Hive UDF's??


Copy link to this message
-
RE: Using Distributed Cache in Hive UDF's??
Viraj Bhat 2010-06-24, 17:33
Hi Edward,

 I was able to use the distributed cache, using the set
mapred.cache.files option. I could read the files locally using standard
java api's.

Thanks

Viraj

 

________________________________

From: Edward Capriolo [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, June 22, 2010 7:44 AM
To: [EMAIL PROTECTED]
Subject: Re: Using Distributed Cache in Hive UDF's??

 

Shameless plug.

IF you put a file in the distributed cache it is in the working
directory of the UDF so you do not need fancy hadoop isms to access it.

Shameless plug:
My geo-ip-udf does exactly this.
http://www.jointhegrid.com/hive-udf-geo-ip-jtg/index.jsp
http://www.jointhegrid.com/svn/hive-udf-geo-ip-jtg/

Edward

On Mon, Jun 21, 2010 at 7:03 PM, Viraj Bhat <[EMAIL PROTECTED]> wrote:

Hi all,

 I have a lookup function in hive which looks if a certain pattern is
present in a large text file. I upload this text file to HDFS. I hope to
use this text file in my UDF  evaluate() method.

Is there some documentation I can look up?

Distributed Cache relies on

lookupFiles = DistributedCache.getLocalCacheFiles(job);

job is of type JobConf.

Where do I get the JobConf object from within the UDF?

 

Thanks

Viraj