|
|
-
Best practices configuring libraries on the backend.
Dmitriy Lyubimov 2012-03-28, 01:43
Hello,
I have a couple of questions regarding mapreduce configurations.
We install various platforms on data nodes that require mixed set of native libraries.
Part of the problem is that in general case, this software platforms may be installed into different locations in the backend. (we try to unify it, but still). What it means, it may require site-specific -Djava.library.path setting.
I configured individual jvm options (mapred.child.java.opts) on each node to include specific set of paths. However, i encountered 2 problems:
#1: my setting doesn't go into effect unless I also declare it final in the data node. It's just being overriden by default -Xmx200 value from the driver EVEN when i don't set it on the driver at all (and there seems to be no way to unset it).
However, using "final" spec at the backend creates a problem if some of numerous jobs we run wishes to override the setting still. The ideal behavior is if i don't set it in the driver, then backend value kicks in, otherwise it's driver's value. But i did not find a way to do that for this particular setting for some reason.Could somebody clarify the best workaround? thank you.
#2. Ideal behavior would actually be to merge driver-specific and backend-specific settings. E.g. backend may need to configure specific software package locations while client may wish sometimes to set heap etc. Is there a best practice to achieve this effect?
Thank you very much in advance. -Dmitriy
-
Re: Best practices configuring libraries on the backend.
George Datskos 2012-03-28, 02:06
Dmitriy,
To deal with different servers having various shared libraries in different locations, you can simply make sure the _TaskTracker_'s -Djava.library.path is set correctly on each server. That library path should be passed along to each child (M/R) task. (in *addition* to the {mapred.child.java.opts} that you specify on the client-side configuration options)
One caveat: on the client-side, don't include "-Djava.library.path" or that path will be passed along to all of the child tasks, overriding site-specific one you set on the TaskTracker. George On 2012/03/28 10:43, Dmitriy Lyubimov wrote: > Hello, > > I have a couple of questions regarding mapreduce configurations. > > We install various platforms on data nodes that require mixed set of > native libraries. > > Part of the problem is that in general case, this software platforms > may be installed into different locations in the backend. (we try to > unify it, but still). What it means, it may require site-specific > -Djava.library.path setting. > > I configured individual jvm options (mapred.child.java.opts) on each > node to include specific set of paths. However, i encountered 2 > problems: > > #1: my setting doesn't go into effect unless I also declare it final > in the data node. It's just being overriden by default -Xmx200 value > from the driver EVEN when i don't set it on the driver at all (and > there seems to be no way to unset it). > > However, using "final" spec at the backend creates a problem if some > of numerous jobs we run wishes to override the setting still. The > ideal behavior is if i don't set it in the driver, then backend value > kicks in, otherwise it's driver's value. But i did not find a way to > do that for this particular setting for some reason.Could somebody > clarify the best workaround? thank you. > > #2. Ideal behavior would actually be to merge driver-specific and > backend-specific settings. E.g. backend may need to configure specific > software package locations while client may wish sometimes to set heap > etc. Is there a best practice to achieve this effect? > > Thank you very much in advance. > -Dmitriy > >
-
Re: Best practices configuring libraries on the backend.
George Datskos 2012-03-28, 02:17
Dmitriy,
I just double-checked, and the caveat I stated earlier is incorrect. So, "-Djava.library.path" set in the client's {mapred.child.java.opts} should just append to to the "-Djava.library.path" that each TaskTracker has when creating the library path for each child (M/R) task. So that's even better I guess. George On 2012/03/28 11:06, George Datskos wrote: > Dmitriy, > > To deal with different servers having various shared libraries in > different locations, you can simply make sure the _TaskTracker_'s > -Djava.library.path is set correctly on each server. That library > path should be passed along to each child (M/R) task. (in *addition* > to the {mapred.child.java.opts} that you specify on the client-side > configuration options) > > One caveat: on the client-side, don't include "-Djava.library.path" or > that path will be passed along to all of the child tasks, overriding > site-specific one you set on the TaskTracker. > > > George > > > On 2012/03/28 10:43, Dmitriy Lyubimov wrote: >> Hello, >> >> I have a couple of questions regarding mapreduce configurations. >> >> We install various platforms on data nodes that require mixed set of >> native libraries. >> >> Part of the problem is that in general case, this software platforms >> may be installed into different locations in the backend. (we try to >> unify it, but still). What it means, it may require site-specific >> -Djava.library.path setting. >> >> I configured individual jvm options (mapred.child.java.opts) on each >> node to include specific set of paths. However, i encountered 2 >> problems: >> >> #1: my setting doesn't go into effect unless I also declare it final >> in the data node. It's just being overriden by default -Xmx200 value >> from the driver EVEN when i don't set it on the driver at all (and >> there seems to be no way to unset it). >> >> However, using "final" spec at the backend creates a problem if some >> of numerous jobs we run wishes to override the setting still. The >> ideal behavior is if i don't set it in the driver, then backend value >> kicks in, otherwise it's driver's value. But i did not find a way to >> do that for this particular setting for some reason.Could somebody >> clarify the best workaround? thank you. >> >> #2. Ideal behavior would actually be to merge driver-specific and >> backend-specific settings. E.g. backend may need to configure specific >> software package locations while client may wish sometimes to set heap >> etc. Is there a best practice to achieve this effect? >> >> Thank you very much in advance. >> -Dmitriy >> >> > > > >
-
Re: Best practices configuring libraries on the backend.
Dmitriy Lyubimov 2012-03-28, 03:08
Thank you, George. I assume you are referring to setenv.sh on the data nodes to set library paths for task tracker, right? On Mar 27, 2012 7:19 PM, "George Datskos" <[EMAIL PROTECTED]> wrote:
> Dmitriy, > > I just double-checked, and the caveat I stated earlier is incorrect. So, > "-Djava.library.path" set in the client's {mapred.child.java.opts} should > just append to to the "-Djava.library.path" that each TaskTracker has when > creating the library path for each child (M/R) task. So that's even better > I guess. > > > George > > > On 2012/03/28 11:06, George Datskos wrote: > >> Dmitriy, >> >> To deal with different servers having various shared libraries in >> different locations, you can simply make sure the _TaskTracker_'s >> -Djava.library.path is set correctly on each server. That library path >> should be passed along to each child (M/R) task. (in *addition* to the >> {mapred.child.java.opts} that you specify on the client-side configuration >> options) >> >> One caveat: on the client-side, don't include "-Djava.library.path" or >> that path will be passed along to all of the child tasks, overriding >> site-specific one you set on the TaskTracker. >> >> >> George >> >> >> On 2012/03/28 10:43, Dmitriy Lyubimov wrote: >> >>> Hello, >>> >>> I have a couple of questions regarding mapreduce configurations. >>> >>> We install various platforms on data nodes that require mixed set of >>> native libraries. >>> >>> Part of the problem is that in general case, this software platforms >>> may be installed into different locations in the backend. (we try to >>> unify it, but still). What it means, it may require site-specific >>> -Djava.library.path setting. >>> >>> I configured individual jvm options (mapred.child.java.opts) on each >>> node to include specific set of paths. However, i encountered 2 >>> problems: >>> >>> #1: my setting doesn't go into effect unless I also declare it final >>> in the data node. It's just being overriden by default -Xmx200 value >>> from the driver EVEN when i don't set it on the driver at all (and >>> there seems to be no way to unset it). >>> >>> However, using "final" spec at the backend creates a problem if some >>> of numerous jobs we run wishes to override the setting still. The >>> ideal behavior is if i don't set it in the driver, then backend value >>> kicks in, otherwise it's driver's value. But i did not find a way to >>> do that for this particular setting for some reason.Could somebody >>> clarify the best workaround? thank you. >>> >>> #2. Ideal behavior would actually be to merge driver-specific and >>> backend-specific settings. E.g. backend may need to configure specific >>> software package locations while client may wish sometimes to set heap >>> etc. Is there a best practice to achieve this effect? >>> >>> Thank you very much in advance. >>> -Dmitriy >>> >>> >>> >> >> >> >> > >
-
Re: Best practices configuring libraries on the backend.
Bharath Mundlapudi 2012-03-28, 12:14
Dmitriy, You can set for map or reduce tasks. Please refer this link: http://hadoop.apache.org/common/docs/r1.0.1/mapred_tutorial.html#Task+Execution+%26+Environment<property> <name>mapred.map.child.java.opts</name> <value> -Xmx512M -Djava.library.path=/home/mycompany/lib -verbose:gc -Xloggc:/tmp/@taskid@.gc -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false </value> </property> <property> <name>mapred.reduce.child.java.opts</name> <value> -Xmx1024M -Djava.library.path=/home/mycompany/lib -verbose:gc -Xloggc:/tmp/@taskid@.gc -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false </value> </property> On Tue, Mar 27, 2012 at 8:08 PM, Dmitriy Lyubimov <[EMAIL PROTECTED]> wrote: > Thank you, George. I assume you are referring to setenv.sh on the data > nodes to set library paths for task tracker, right? > On Mar 27, 2012 7:19 PM, "George Datskos" <[EMAIL PROTECTED]> > wrote: > >> Dmitriy, >> >> I just double-checked, and the caveat I stated earlier is incorrect. So, >> "-Djava.library.path" set in the client's {mapred.child.java.opts} should >> just append to to the "-Djava.library.path" that each TaskTracker has when >> creating the library path for each child (M/R) task. So that's even better >> I guess. >> >> >> George >> >> >> On 2012/03/28 11:06, George Datskos wrote: >> >>> Dmitriy, >>> >>> To deal with different servers having various shared libraries in >>> different locations, you can simply make sure the _TaskTracker_'s >>> -Djava.library.path is set correctly on each server. That library path >>> should be passed along to each child (M/R) task. (in *addition* to the >>> {mapred.child.java.opts} that you specify on the client-side configuration >>> options) >>> >>> One caveat: on the client-side, don't include "-Djava.library.path" or >>> that path will be passed along to all of the child tasks, overriding >>> site-specific one you set on the TaskTracker. >>> >>> >>> George >>> >>> >>> On 2012/03/28 10:43, Dmitriy Lyubimov wrote: >>> >>>> Hello, >>>> >>>> I have a couple of questions regarding mapreduce configurations. >>>> >>>> We install various platforms on data nodes that require mixed set of >>>> native libraries. >>>> >>>> Part of the problem is that in general case, this software platforms >>>> may be installed into different locations in the backend. (we try to >>>> unify it, but still). What it means, it may require site-specific >>>> -Djava.library.path setting. >>>> >>>> I configured individual jvm options (mapred.child.java.opts) on each >>>> node to include specific set of paths. However, i encountered 2 >>>> problems: >>>> >>>> #1: my setting doesn't go into effect unless I also declare it final >>>> in the data node. It's just being overriden by default -Xmx200 value >>>> from the driver EVEN when i don't set it on the driver at all (and >>>> there seems to be no way to unset it). >>>> >>>> However, using "final" spec at the backend creates a problem if some >>>> of numerous jobs we run wishes to override the setting still. The >>>> ideal behavior is if i don't set it in the driver, then backend value >>>> kicks in, otherwise it's driver's value. But i did not find a way to >>>> do that for this particular setting for some reason.Could somebody >>>> clarify the best workaround? thank you. >>>> >>>> #2. Ideal behavior would actually be to merge driver-specific and >>>> backend-specific settings. E.g. backend may need to configure specific >>>> software package locations while client may wish sometimes to set heap >>>> etc. Is there a best practice to achieve this effect? >>>> >>>> Thank you very much in advance. >>>> -Dmitriy >>>> >>>> >>>> >>> >>> >>> >>> >> >>
-
Re: Best practices configuring libraries on the backend.
Dmitriy Lyubimov 2012-03-28, 20:19
Hm. doesn't seem to work for me (with cdh3u3) I defined
export HADOOP_TASKTRACKER_OPTS="-Djava.library.path=/usr/...."
and it doesn't seem to work (as opposed to when i set is with <final> property mapred.child.java.opts on the data node).
Still puzzling.
On Tue, Mar 27, 2012 at 7:17 PM, George Datskos <[EMAIL PROTECTED]> wrote: > Dmitriy, > > I just double-checked, and the caveat I stated earlier is incorrect. So, > "-Djava.library.path" set in the client's {mapred.child.java.opts} should > just append to to the "-Djava.library.path" that each TaskTracker has when > creating the library path for each child (M/R) task. So that's even better > I guess. > > > George > > > > On 2012/03/28 11:06, George Datskos wrote: >> >> Dmitriy, >> >> To deal with different servers having various shared libraries in >> different locations, you can simply make sure the _TaskTracker_'s >> -Djava.library.path is set correctly on each server. That library path >> should be passed along to each child (M/R) task. (in *addition* to the >> {mapred.child.java.opts} that you specify on the client-side configuration >> options) >> >> One caveat: on the client-side, don't include "-Djava.library.path" or >> that path will be passed along to all of the child tasks, overriding >> site-specific one you set on the TaskTracker. >> >> >> George >> >> >> On 2012/03/28 10:43, Dmitriy Lyubimov wrote: >>> >>> Hello, >>> >>> I have a couple of questions regarding mapreduce configurations. >>> >>> We install various platforms on data nodes that require mixed set of >>> native libraries. >>> >>> Part of the problem is that in general case, this software platforms >>> may be installed into different locations in the backend. (we try to >>> unify it, but still). What it means, it may require site-specific >>> -Djava.library.path setting. >>> >>> I configured individual jvm options (mapred.child.java.opts) on each >>> node to include specific set of paths. However, i encountered 2 >>> problems: >>> >>> #1: my setting doesn't go into effect unless I also declare it final >>> in the data node. It's just being overriden by default -Xmx200 value >>> from the driver EVEN when i don't set it on the driver at all (and >>> there seems to be no way to unset it). >>> >>> However, using "final" spec at the backend creates a problem if some >>> of numerous jobs we run wishes to override the setting still. The >>> ideal behavior is if i don't set it in the driver, then backend value >>> kicks in, otherwise it's driver's value. But i did not find a way to >>> do that for this particular setting for some reason.Could somebody >>> clarify the best workaround? thank you. >>> >>> #2. Ideal behavior would actually be to merge driver-specific and >>> backend-specific settings. E.g. backend may need to configure specific >>> software package locations while client may wish sometimes to set heap >>> etc. Is there a best practice to achieve this effect? >>> >>> Thank you very much in advance. >>> -Dmitriy >>> >>> >> >> >> >> > >
-
Re: Best practices configuring libraries on the backend.
George Datskos 2012-03-29, 00:04
Dmitriy
I've tested it on hadoop 1.0.0 and 1.0.1. (I don't know which version cdh3u3 is based off of)
In hadoop-env.sh if I set HADOOP_TASKTRACKER_OPTS="-Djava.library.path=/usr/blah" the TaskTracker sees that option. Then it gets passed along to all M/R child tasks on that node. Can you confirm that your TaskTrackers are actually seeing the passed option? (through the ps command) George On 2012/03/29 5:19, Dmitriy Lyubimov wrote: > Hm. doesn't seem to work for me (with cdh3u3) > I defined > > export HADOOP_TASKTRACKER_OPTS="-Djava.library.path=/usr/...." > > and it doesn't seem to work (as opposed to when i set is with<final> > property mapred.child.java.opts on the data node). > > Still puzzling. > > On Tue, Mar 27, 2012 at 7:17 PM, George Datskos > <[EMAIL PROTECTED]> wrote: >> Dmitriy, >> >> I just double-checked, and the caveat I stated earlier is incorrect. So, >> "-Djava.library.path" set in the client's {mapred.child.java.opts} should >> just append to to the "-Djava.library.path" that each TaskTracker has when >> creating the library path for each child (M/R) task. So that's even better >> I guess. >> >> >> George >> >> >> >> On 2012/03/28 11:06, George Datskos wrote: >>> Dmitriy, >>> >>> To deal with different servers having various shared libraries in >>> different locations, you can simply make sure the _TaskTracker_'s >>> -Djava.library.path is set correctly on each server. That library path >>> should be passed along to each child (M/R) task. (in *addition* to the >>> {mapred.child.java.opts} that you specify on the client-side configuration >>> options) >>> >>> One caveat: on the client-side, don't include "-Djava.library.path" or >>> that path will be passed along to all of the child tasks, overriding >>> site-specific one you set on the TaskTracker. >>> >>> >>> George >>> >>> >>> On 2012/03/28 10:43, Dmitriy Lyubimov wrote: >>>> Hello, >>>> >>>> I have a couple of questions regarding mapreduce configurations. >>>> >>>> We install various platforms on data nodes that require mixed set of >>>> native libraries. >>>> >>>> Part of the problem is that in general case, this software platforms >>>> may be installed into different locations in the backend. (we try to >>>> unify it, but still). What it means, it may require site-specific >>>> -Djava.library.path setting. >>>> >>>> I configured individual jvm options (mapred.child.java.opts) on each >>>> node to include specific set of paths. However, i encountered 2 >>>> problems: >>>> >>>> #1: my setting doesn't go into effect unless I also declare it final >>>> in the data node. It's just being overriden by default -Xmx200 value >>>> from the driver EVEN when i don't set it on the driver at all (and >>>> there seems to be no way to unset it). >>>> >>>> However, using "final" spec at the backend creates a problem if some >>>> of numerous jobs we run wishes to override the setting still. The >>>> ideal behavior is if i don't set it in the driver, then backend value >>>> kicks in, otherwise it's driver's value. But i did not find a way to >>>> do that for this particular setting for some reason.Could somebody >>>> clarify the best workaround? thank you. >>>> >>>> #2. Ideal behavior would actually be to merge driver-specific and >>>> backend-specific settings. E.g. backend may need to configure specific >>>> software package locations while client may wish sometimes to set heap >>>> etc. Is there a best practice to achieve this effect? >>>> >>>> Thank you very much in advance. >>>> -Dmitriy >>>> >>>> >>> >>> >>> >> >
-
Re: Best practices configuring libraries on the backend.
Harsh J 2012-03-29, 04:57
George,
This ought to work. Did you restart all your TTs to have it set into effect?
Also, the right way to do this across Hadoop (in 1.0/cdh3/whatever) is to add into your hadoop-env.sh:
JAVA_LIBRARY_PATH=/path/to/your/libs:$JAVA_LIBRARY_PATH
This way you do not stand to lose Hadoop's native libs.
On Thu, Mar 29, 2012 at 5:34 AM, George Datskos <[EMAIL PROTECTED]> wrote: > Dmitriy > > I've tested it on hadoop 1.0.0 and 1.0.1. (I don't know which version > cdh3u3 is based off of)
Just FYI: CDH3 is based off of 0.20+append+security branches, much like the renamed 1.0 now recently is.
> In hadoop-env.sh if I set > HADOOP_TASKTRACKER_OPTS="-Djava.library.path=/usr/blah" the TaskTracker > sees > that option. Then it gets passed along to all M/R child tasks on that > node. > Can you confirm that your TaskTrackers are actually seeing the passed > option? (through the ps command) > > > George > > > > On 2012/03/29 5:19, Dmitriy Lyubimov wrote: >> >> Hm. doesn't seem to work for me (with cdh3u3) >> I defined >> >> export HADOOP_TASKTRACKER_OPTS="-Djava.library.path=/usr/...." >> >> and it doesn't seem to work (as opposed to when i set is with<final> >> property mapred.child.java.opts on the data node). >> >> Still puzzling. >> >> On Tue, Mar 27, 2012 at 7:17 PM, George Datskos >> <[EMAIL PROTECTED]> wrote: >>> >>> Dmitriy, >>> >>> I just double-checked, and the caveat I stated earlier is incorrect. >>> So, >>> "-Djava.library.path" set in the client's {mapred.child.java.opts} >>> should >>> just append to to the "-Djava.library.path" that each TaskTracker has >>> when >>> creating the library path for each child (M/R) task. So that's even >>> better >>> I guess. >>> >>> >>> George >>> >>> >>> >>> On 2012/03/28 11:06, George Datskos wrote: >>>> >>>> Dmitriy, >>>> >>>> To deal with different servers having various shared libraries in >>>> different locations, you can simply make sure the _TaskTracker_'s >>>> -Djava.library.path is set correctly on each server. That library path >>>> should be passed along to each child (M/R) task. (in *addition* to the >>>> {mapred.child.java.opts} that you specify on the client-side >>>> configuration >>>> options) >>>> >>>> One caveat: on the client-side, don't include "-Djava.library.path" or >>>> that path will be passed along to all of the child tasks, overriding >>>> site-specific one you set on the TaskTracker. >>>> >>>> >>>> George >>>> >>>> >>>> On 2012/03/28 10:43, Dmitriy Lyubimov wrote: >>>>> >>>>> Hello, >>>>> >>>>> I have a couple of questions regarding mapreduce configurations. >>>>> >>>>> We install various platforms on data nodes that require mixed set of >>>>> native libraries. >>>>> >>>>> Part of the problem is that in general case, this software platforms >>>>> may be installed into different locations in the backend. (we try to >>>>> unify it, but still). What it means, it may require site-specific >>>>> -Djava.library.path setting. >>>>> >>>>> I configured individual jvm options (mapred.child.java.opts) on each >>>>> node to include specific set of paths. However, i encountered 2 >>>>> problems: >>>>> >>>>> #1: my setting doesn't go into effect unless I also declare it final >>>>> in the data node. It's just being overriden by default -Xmx200 value >>>>> from the driver EVEN when i don't set it on the driver at all (and >>>>> there seems to be no way to unset it). >>>>> >>>>> However, using "final" spec at the backend creates a problem if some >>>>> of numerous jobs we run wishes to override the setting still. The >>>>> ideal behavior is if i don't set it in the driver, then backend value >>>>> kicks in, otherwise it's driver's value. But i did not find a way to >>>>> do that for this particular setting for some reason.Could somebody >>>>> clarify the best workaround? thank you. >>>>> >>>>> #2. Ideal behavior would actually be to merge driver-specific and >>>>> backend-specific settings. E.g. backend may need to configure specific >>>>> software package locations while client may wish sometimes to set heap
Harsh J
-
Re: Best practices configuring libraries on the backend.
Dmitriy Lyubimov 2012-03-30, 18:00
yes. JAVA_LIBRARY_PATH seems to be the approach that works (rather than just putting it into tasktracker_opts etc.)
Thanks.
On Wed, Mar 28, 2012 at 9:57 PM, Harsh J <[EMAIL PROTECTED]> wrote: > George, > > This ought to work. Did you restart all your TTs to have it set into effect? > > Also, the right way to do this across Hadoop (in 1.0/cdh3/whatever) is > to add into your hadoop-env.sh: > > JAVA_LIBRARY_PATH=/path/to/your/libs:$JAVA_LIBRARY_PATH > > This way you do not stand to lose Hadoop's native libs. > > On Thu, Mar 29, 2012 at 5:34 AM, George Datskos > <[EMAIL PROTECTED]> wrote: >> Dmitriy >> >> I've tested it on hadoop 1.0.0 and 1.0.1. (I don't know which version >> cdh3u3 is based off of) > > Just FYI: CDH3 is based off of 0.20+append+security branches, much > like the renamed 1.0 now recently is. > >> In hadoop-env.sh if I set >> HADOOP_TASKTRACKER_OPTS="-Djava.library.path=/usr/blah" the TaskTracker >> sees >> that option. Then it gets passed along to all M/R child tasks on that >> node. >> Can you confirm that your TaskTrackers are actually seeing the passed >> option? (through the ps command) >> >> >> George >> >> >> >> On 2012/03/29 5:19, Dmitriy Lyubimov wrote: >>> >>> Hm. doesn't seem to work for me (with cdh3u3) >>> I defined >>> >>> export HADOOP_TASKTRACKER_OPTS="-Djava.library.path=/usr/...." >>> >>> and it doesn't seem to work (as opposed to when i set is with<final> >>> property mapred.child.java.opts on the data node). >>> >>> Still puzzling. >>> >>> On Tue, Mar 27, 2012 at 7:17 PM, George Datskos >>> <[EMAIL PROTECTED]> wrote: >>>> >>>> Dmitriy, >>>> >>>> I just double-checked, and the caveat I stated earlier is incorrect. >>>> So, >>>> "-Djava.library.path" set in the client's {mapred.child.java.opts} >>>> should >>>> just append to to the "-Djava.library.path" that each TaskTracker has >>>> when >>>> creating the library path for each child (M/R) task. So that's even >>>> better >>>> I guess. >>>> >>>> >>>> George >>>> >>>> >>>> >>>> On 2012/03/28 11:06, George Datskos wrote: >>>>> >>>>> Dmitriy, >>>>> >>>>> To deal with different servers having various shared libraries in >>>>> different locations, you can simply make sure the _TaskTracker_'s >>>>> -Djava.library.path is set correctly on each server. That library path >>>>> should be passed along to each child (M/R) task. (in *addition* to the >>>>> {mapred.child.java.opts} that you specify on the client-side >>>>> configuration >>>>> options) >>>>> >>>>> One caveat: on the client-side, don't include "-Djava.library.path" or >>>>> that path will be passed along to all of the child tasks, overriding >>>>> site-specific one you set on the TaskTracker. >>>>> >>>>> >>>>> George >>>>> >>>>> >>>>> On 2012/03/28 10:43, Dmitriy Lyubimov wrote: >>>>>> >>>>>> Hello, >>>>>> >>>>>> I have a couple of questions regarding mapreduce configurations. >>>>>> >>>>>> We install various platforms on data nodes that require mixed set of >>>>>> native libraries. >>>>>> >>>>>> Part of the problem is that in general case, this software platforms >>>>>> may be installed into different locations in the backend. (we try to >>>>>> unify it, but still). What it means, it may require site-specific >>>>>> -Djava.library.path setting. >>>>>> >>>>>> I configured individual jvm options (mapred.child.java.opts) on each >>>>>> node to include specific set of paths. However, i encountered 2 >>>>>> problems: >>>>>> >>>>>> #1: my setting doesn't go into effect unless I also declare it final >>>>>> in the data node. It's just being overriden by default -Xmx200 value >>>>>> from the driver EVEN when i don't set it on the driver at all (and >>>>>> there seems to be no way to unset it). >>>>>> >>>>>> However, using "final" spec at the backend creates a problem if some >>>>>> of numerous jobs we run wishes to override the setting still. The >>>>>> ideal behavior is if i don't set it in the driver, then backend value >>>>>> kicks in, otherwise it's driver's value. But i did not find a way to
|
|