Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Best practices configuring libraries on the backend.

Copy link to this message
Re: Best practices configuring libraries on the backend.
George Datskos 2012-03-29, 00:04

I've tested it on hadoop 1.0.0 and 1.0.1.  (I don't know which version
cdh3u3 is based off of)

In hadoop-env.sh if I set
HADOOP_TASKTRACKER_OPTS="-Djava.library.path=/usr/blah" the TaskTracker
sees that option.  Then it gets passed along to all M/R child tasks on
that node.  Can you confirm that your TaskTrackers are actually seeing
the passed option? (through the ps command)
On 2012/03/29 5:19, Dmitriy Lyubimov wrote:
> Hm. doesn't seem to work for me (with cdh3u3)
> I defined
> export HADOOP_TASKTRACKER_OPTS="-Djava.library.path=/usr/...."
> and it doesn't seem to work (as opposed to when i set is with<final>
> property mapred.child.java.opts on the data node).
> Still puzzling.
> On Tue, Mar 27, 2012 at 7:17 PM, George Datskos
> <[EMAIL PROTECTED]>  wrote:
>> Dmitriy,
>> I just double-checked, and the caveat I stated earlier is incorrect.  So,
>>   "-Djava.library.path" set in the client's {mapred.child.java.opts} should
>> just append to to the "-Djava.library.path" that each TaskTracker has when
>> creating the library path for each child (M/R) task.  So that's even better
>> I guess.
>> George
>> On 2012/03/28 11:06, George Datskos wrote:
>>> Dmitriy,
>>> To deal with different servers having various shared libraries in
>>> different locations, you can simply make sure the _TaskTracker_'s
>>> -Djava.library.path is set correctly on each server.  That library path
>>> should be passed along to each child (M/R) task.  (in *addition* to the
>>> {mapred.child.java.opts} that you specify on the client-side configuration
>>> options)
>>> One caveat: on the client-side, don't include "-Djava.library.path" or
>>> that path will be passed along to all of the child tasks, overriding
>>> site-specific one you set on the TaskTracker.
>>> George
>>> On 2012/03/28 10:43, Dmitriy Lyubimov wrote:
>>>> Hello,
>>>> I have a couple of questions regarding mapreduce configurations.
>>>> We install various platforms on data nodes that require mixed set of
>>>> native libraries.
>>>> Part of the problem is that in general case, this software platforms
>>>> may be installed into different locations in the backend. (we try to
>>>> unify it, but still). What it means, it may require site-specific
>>>> -Djava.library.path setting.
>>>> I configured individual jvm options (mapred.child.java.opts) on each
>>>> node to include specific set of paths. However, i encountered 2
>>>> problems:
>>>> #1: my setting doesn't go into effect unless I also declare it final
>>>> in the data node. It's just being overriden by default -Xmx200 value
>>>> from the driver  EVEN when i don't set it on the driver at all (and
>>>> there seems to be no way to unset it).
>>>> However, using "final" spec at the backend creates  a problem if some
>>>> of numerous jobs we run wishes to override the setting still. The
>>>> ideal behavior is if i don't set it in the driver, then backend value
>>>> kicks in, otherwise it's driver's value. But i did not find a way to
>>>> do that for this particular setting for some reason.Could somebody
>>>> clarify the best workaround? thank you.
>>>> #2. Ideal behavior would actually be to merge driver-specific and
>>>> backend-specific settings. E.g. backend may need to configure specific
>>>> software package locations while client may wish sometimes to set heap
>>>> etc. Is there a best practice to achieve this effect?
>>>> Thank you very much in advance.
>>>> -Dmitriy