Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Child JVM, Distributed Cache and Language Embedding


Copy link to this message
-
Re: Child JVM, Distributed Cache and Language Embedding
Actually HADOOP places the symlink into the local
working directory of of the JVM process.  I use this
method to push shared objects (CUDA and OpenCL) to
nodes for tasks.    There is a section in the
HADOOP docs on shared object loading that should
help (although I found I did not need to and could
not do the System.loadLibrary call recommended).

You can also bundle the stuff into the JAR file in
a subdir and that will be unpacked to the local
working dir.   The nice thing about using the
distributed cache is the files only need to be pushed
to the cluster once with a copyFromLocal and then
just symlinked at runtime so it is much faster.
On 2/13/2013 1:55 AM, Saptarshi Guha wrote:
> Hmm,
> distributedcache.getLocalCacheArchives
>
>
> On Tue, Feb 12, 2013 at 9:28 PM, Saptarshi Guha
> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>
>     Hello,
>
>     I'm bit fuzzy on the details here so appreciate your help.
>
>     I am embedding a language into the JVM. My hadoop job will
>     instantiate the child JVM once for all tasks assigned
>     (mapred.job.reuse.jvm.num.tasks = -1)
>
>     So if a node can run 6 parallel JVMs, it will and these 6 will churn
>     through all the tasks assigned to them.
>
>     Now, per JVM, the language engine will be instantiated. For this to
>     work, I will ship the language distribution to the nodes (the nodes
>     are really bare and installing the language on the node is not an
>     option) using the distributed cache (as a tar.gz. file).
>
>     My understanding is that HadoopMapreduce will unarchive this tgz
>     file and then for every task attempt symlink it into the task
>     attempt's working folder.
>
>     However, for the language engine  to be successfully initialized i
>     need to know the location of the unarchived file, a location that
>     will stay constant across all task attempts for that child JVM,
>
>     Q: How can i infer this location?
>
>     Cheers
>     Saptarshi
>
>

--
========= mailto:[EMAIL PROTECTED] ===========David W. Boyd
Vice President, Operations
Lorenz Research, a Data Tactics corporation
7901 Jones Branch, Suite 610
Mclean, VA 22102
office:   +1-703-506-3735, ext 308
fax:     +1-703-506-6703
cell:     +1-703-402-7908
============== http://www.lorenzresearch.com/ ===========

The information contained in this message may be privileged
and/or confidential and protected from disclosure.
If the reader of this message is not the intended recipient
or an employee or agent responsible for delivering this message
to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this communication
is strictly prohibited.  If you have received this communication
in error, please notify the sender immediately by replying to
this message and deleting the material from any computer.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB