On Tue, Feb 12, 2013 at 9:28 PM, Saptarshi Guha <[EMAIL PROTECTED]>wrote:
> I'm bit fuzzy on the details here so appreciate your help.
> I am embedding a language into the JVM. My hadoop job will instantiate the
> child JVM once for all tasks assigned (mapred.job.reuse.jvm.num.tasks > -1)
> So if a node can run 6 parallel JVMs, it will and these 6 will churn
> through all the tasks assigned to them.
> Now, per JVM, the language engine will be instantiated. For this to work,
> I will ship the language distribution to the nodes (the nodes are really
> bare and installing the language on the node is not an option) using the
> distributed cache (as a tar.gz. file).
> My understanding is that HadoopMapreduce will unarchive this tgz file and
> then for every task attempt symlink it into the task attempt's working
> However, for the language engine to be successfully initialized i need to
> know the location of the unarchived file, a location that will stay
> constant across all task attempts for that child JVM,
> Q: How can i infer this location?