|
Shi Yu
2010-10-11, 20:50
Charles Lee
2010-10-13, 02:36
M. C. Srivas
2010-10-13, 02:59
Shi Yu
2010-10-13, 15:04
Matt Pouttu-Clarke
2010-10-13, 15:48
Luke Lu
2010-10-13, 18:30
Shi Yu
2010-10-13, 18:45
Luke Lu
2010-10-13, 19:15
Shi Yu
2010-10-13, 19:27
Luke Lu
2010-10-13, 20:28
Shi Yu
2010-10-13, 21:21
Konstantin Boudnik
2010-10-13, 21:26
Luke Lu
2010-10-13, 21:28
Shi Yu
2010-10-13, 23:18
Bharath Mundlapudi
2010-10-14, 00:03
Luke Lu
2010-10-14, 00:04
Shi Yu
2010-10-14, 00:13
Shi Yu
2010-10-14, 00:31
Shi Yu
2010-10-14, 02:02
|
-
load a serialized object in hadoopShi Yu 2010-10-11, 20:50
Hi,
I want to load a serialized HashMap object in hadoop. The file of stored object is 200M. I could read that object efficiently in JAVA by setting -Xmx as 1000M. However, in hadoop I could never load it into memory. The code is very simple (just read the ObjectInputStream) and there is yet no map/reduce implemented. I set the mapred.child.java.opts=-Xmx3000M, still get the "java.lang.OutOfMemoryError: Java heap space" Could anyone explain a little bit how memory is allocate to JVM in hadoop. Why hadoop takes up so much memory? If a program requires 1G memory on a single node, how much memory it requires (generally) in Hadoop? Thanks. Shi --
-
Re: load a serialized object in hadoopCharles Lee 2010-10-13, 02:36
In 32 bit machine, the biggest memory the jvm can provide is in the range of
1.5g to 2.0g. So if you want a bigger memory, say 3000M, you should have a 64bit machine. On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu <[EMAIL PROTECTED]> wrote: > Hi, > > I want to load a serialized HashMap object in hadoop. The file of stored > object is 200M. I could read that object efficiently in JAVA by setting -Xmx > as 1000M. However, in hadoop I could never load it into memory. The code is > very simple (just read the ObjectInputStream) and there is yet no map/reduce > implemented. I set the mapred.child.java.opts=-Xmx3000M, still get the > "java.lang.OutOfMemoryError: Java heap space" Could anyone explain a little > bit how memory is allocate to JVM in hadoop. Why hadoop takes up so much > memory? If a program requires 1G memory on a single node, how much memory > it requires (generally) in Hadoop? > > Thanks. > > Shi > > -- > > -- Yours sincerely, Charles Lee
-
Re: load a serialized object in hadoopM. C. Srivas 2010-10-13, 02:59
>
> > On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > I want to load a serialized HashMap object in hadoop. The file of stored > > object is 200M. I could read that object efficiently in JAVA by setting > -Xmx > > as 1000M. However, in hadoop I could never load it into memory. The code > is > > very simple (just read the ObjectInputStream) and there is yet no > map/reduce > > implemented. I set the mapred.child.java.opts=-Xmx3000M, still get the > > "java.lang.OutOfMemoryError: Java heap space" Could anyone explain a > little > > bit how memory is allocate to JVM in hadoop. Why hadoop takes up so much > > memory? If a program requires 1G memory on a single node, how much > memory > > it requires (generally) in Hadoop? > The JVM reserves swap space in advance, at the time of launching the process. If your swap is too low (or do not have any swap configured), you will hit this. Or, you are on a 32-bit machine, in which case 3G is not possible in the JVM. -Srivas. > > > > Thanks. > > > > Shi > > > > -- > > > > >
-
Re: load a serialized object in hadoopShi Yu 2010-10-13, 15:04
As a coming-up to the my own question, I think to invoke the JVM in
Hadoop requires much more memory than an ordinary JVM. I found that instead of serialization the object, maybe I could create a MapFile as an index to permit lookups by key in Hadoop. I have also compared the performance of MongoDB and Memcache. I will let you know the result after I try the MapFile approach. Shi On 2010-10-12 21:59, M. C. Srivas wrote: >> >> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<[EMAIL PROTECTED]> wrote: >> >> >>> Hi, >>> >>> I want to load a serialized HashMap object in hadoop. The file of stored >>> object is 200M. I could read that object efficiently in JAVA by setting >>> >> -Xmx >> >>> as 1000M. However, in hadoop I could never load it into memory. The code >>> >> is >> >>> very simple (just read the ObjectInputStream) and there is yet no >>> >> map/reduce >> >>> implemented. I set the mapred.child.java.opts=-Xmx3000M, still get the >>> "java.lang.OutOfMemoryError: Java heap space" Could anyone explain a >>> >> little >> >>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so much >>> memory? If a program requires 1G memory on a single node, how much >>> >> memory >> >>> it requires (generally) in Hadoop? >>> >> > The JVM reserves swap space in advance, at the time of launching the > process. If your swap is too low (or do not have any swap configured), you > will hit this. > > Or, you are on a 32-bit machine, in which case 3G is not possible in the > JVM. > > -Srivas. > > > > >>> Thanks. >>> >>> Shi >>> >>> -- >>> >>> >>> >> >
-
Re: load a serialized object in hadoopMatt Pouttu-Clarke 2010-10-13, 15:48
Also, serialization often keeps previously read object references in
memory. Better to use Thrift or Avro to serialize the object. In my experience serialization is inefficient for large object graphs, but works fine for smaller graphs (depending on how much memory you have to work with). Also for that small of data memcache and mongo may be overkill (unless the data changes frequently) Cheers, Matt On Oct 13, 2010, at 11:04 AM, "Shi Yu" <[EMAIL PROTECTED]> wrote: > As a coming-up to the my own question, I think to invoke the JVM in > Hadoop requires much more memory than an ordinary JVM. I found that > instead of serialization the object, maybe I could create a MapFile > as an index to permit lookups by key in Hadoop. I have also compared > the performance of MongoDB and Memcache. I will let you know the > result after I try the MapFile approach. > > Shi > > On 2010-10-12 21:59, M. C. Srivas wrote: >>> >>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<[EMAIL PROTECTED]> wrote: >>> >>> >>>> Hi, >>>> >>>> I want to load a serialized HashMap object in hadoop. The file of >>>> stored >>>> object is 200M. I could read that object efficiently in JAVA by >>>> setting >>>> >>> -Xmx >>> >>>> as 1000M. However, in hadoop I could never load it into memory. >>>> The code >>>> >>> is >>> >>>> very simple (just read the ObjectInputStream) and there is yet no >>>> >>> map/reduce >>> >>>> implemented. I set the mapred.child.java.opts=-Xmx3000M, still >>>> get the >>>> "java.lang.OutOfMemoryError: Java heap space" Could anyone >>>> explain a >>>> >>> little >>> >>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up >>>> so much >>>> memory? If a program requires 1G memory on a single node, how much >>>> >>> memory >>> >>>> it requires (generally) in Hadoop? >>>> >>> >> The JVM reserves swap space in advance, at the time of launching the >> process. If your swap is too low (or do not have any swap >> configured), you >> will hit this. >> >> Or, you are on a 32-bit machine, in which case 3G is not possible >> in the >> JVM. >> >> -Srivas. >> >> >> >> >>>> Thanks. >>>> >>>> Shi >>>> >>>> -- >>>> >>>> >>>> >>> >> > > iCrossing Privileged and Confidential Information This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information of iCrossing. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
-
Re: load a serialized object in hadoopLuke Lu 2010-10-13, 18:30
On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu <[EMAIL PROTECTED]> wrote:
> As a coming-up to the my own question, I think to invoke the JVM in Hadoop > requires much more memory than an ordinary JVM. That's simply not true. The default mapreduce task Xmx is 200M, which is much smaller than the standard jvm default 512M and most users don't need to increase it. Please post the code reading the object (in hdfs?) in your tasks. > I found that instead of > serialization the object, maybe I could create a MapFile as an index to > permit lookups by key in Hadoop. I have also compared the performance of > MongoDB and Memcache. I will let you know the result after I try the MapFile > approach. > > Shi > > On 2010-10-12 21:59, M. C. Srivas wrote: >>> >>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<[EMAIL PROTECTED]> wrote: >>> >>> >>>> >>>> Hi, >>>> >>>> I want to load a serialized HashMap object in hadoop. The file of stored >>>> object is 200M. I could read that object efficiently in JAVA by setting >>>> >>> >>> -Xmx >>> >>>> >>>> as 1000M. However, in hadoop I could never load it into memory. The >>>> code >>>> >>> >>> is >>> >>>> >>>> very simple (just read the ObjectInputStream) and there is yet no >>>> >>> >>> map/reduce >>> >>>> >>>> implemented. I set the mapred.child.java.opts=-Xmx3000M, still get the >>>> "java.lang.OutOfMemoryError: Java heap space" Could anyone explain a >>>> >>> >>> little >>> >>>> >>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so much >>>> memory? If a program requires 1G memory on a single node, how much >>>> >>> >>> memory >>> >>>> >>>> it requires (generally) in Hadoop? >>>> >>> >>> >> >> The JVM reserves swap space in advance, at the time of launching the >> process. If your swap is too low (or do not have any swap configured), you >> will hit this. >> >> Or, you are on a 32-bit machine, in which case 3G is not possible in the >> JVM. >> >> -Srivas. >> >> >> >> >>>> >>>> Thanks. >>>> >>>> Shi >>>> >>>> -- >>>> >>>> >>>> >>> >>> >> >> > > >
-
Re: load a serialized object in hadoopShi Yu 2010-10-13, 18:45
Here is my code. There is no Map/Reduce in it. I could run this code
using java -Xmx1000m , however, when using bin/hadoop -D mapred.child.java.opts=-Xmx3000M it has heap space not enough error. I have tried other program in Hadoop with the same settings so the memory is available in my machines. public static void main(String[] args) { try{ String myFile = "xxx.dat"; FileInputStream fin = new FileInputStream(myFile); ois = new ObjectInputStream(fin); margintagMap = ois.readObject(); ois.close(); fin.close(); }catch(Exception e){ // } } On 2010-10-13 13:30, Luke Lu wrote: > On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<[EMAIL PROTECTED]> wrote: > >> As a coming-up to the my own question, I think to invoke the JVM in Hadoop >> requires much more memory than an ordinary JVM. >> > That's simply not true. The default mapreduce task Xmx is 200M, which > is much smaller than the standard jvm default 512M and most users > don't need to increase it. Please post the code reading the object (in > hdfs?) in your tasks. > > >> I found that instead of >> serialization the object, maybe I could create a MapFile as an index to >> permit lookups by key in Hadoop. I have also compared the performance of >> MongoDB and Memcache. I will let you know the result after I try the MapFile >> approach. >> >> Shi >> >> On 2010-10-12 21:59, M. C. Srivas wrote: >> >>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<[EMAIL PROTECTED]> wrote: >>>> >>>> >>>> >>>>> Hi, >>>>> >>>>> I want to load a serialized HashMap object in hadoop. The file of stored >>>>> object is 200M. I could read that object efficiently in JAVA by setting >>>>> >>>>> >>>> -Xmx >>>> >>>> >>>>> as 1000M. However, in hadoop I could never load it into memory. The >>>>> code >>>>> >>>>> >>>> is >>>> >>>> >>>>> very simple (just read the ObjectInputStream) and there is yet no >>>>> >>>>> >>>> map/reduce >>>> >>>> >>>>> implemented. I set the mapred.child.java.opts=-Xmx3000M, still get the >>>>> "java.lang.OutOfMemoryError: Java heap space" Could anyone explain a >>>>> >>>>> >>>> little >>>> >>>> >>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so much >>>>> memory? If a program requires 1G memory on a single node, how much >>>>> >>>>> >>>> memory >>>> >>>> >>>>> it requires (generally) in Hadoop? >>>>> >>>>> >>>> >>>> >>> The JVM reserves swap space in advance, at the time of launching the >>> process. If your swap is too low (or do not have any swap configured), you >>> will hit this. >>> >>> Or, you are on a 32-bit machine, in which case 3G is not possible in the >>> JVM. >>> >>> -Srivas. >>> >>> >>> >>> >>> >>>>> Thanks. >>>>> >>>>> Shi >>>>> >>>>> -- >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> -- Postdoctoral Scholar Institute for Genomics and Systems Biology Department of Medicine, the University of Chicago Knapp Center for Biomedical Discovery 900 E. 57th St. Room 10148 Chicago, IL 60637, US Tel: 773-702-6799
-
Re: load a serialized object in hadoopLuke Lu 2010-10-13, 19:15
Can you post your mapper/reducer implementation? or are you using
hadoop streaming? for which mapred.child.java.opts doesn't apply to the jvm you care about. BTW, what's the hadoop version you're using? On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu <[EMAIL PROTECTED]> wrote: > Here is my code. There is no Map/Reduce in it. I could run this code using > java -Xmx1000m , however, when using bin/hadoop -D > mapred.child.java.opts=-Xmx3000M it has heap space not enough error. I > have tried other program in Hadoop with the same settings so the memory is > available in my machines. > > > public static void main(String[] args) { > try{ > String myFile = "xxx.dat"; > FileInputStream fin = new FileInputStream(myFile); > ois = new ObjectInputStream(fin); > margintagMap = ois.readObject(); > ois.close(); > fin.close(); > }catch(Exception e){ > // > } > } > > On 2010-10-13 13:30, Luke Lu wrote: >> >> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<[EMAIL PROTECTED]> wrote: >> >>> >>> As a coming-up to the my own question, I think to invoke the JVM in >>> Hadoop >>> requires much more memory than an ordinary JVM. >>> >> >> That's simply not true. The default mapreduce task Xmx is 200M, which >> is much smaller than the standard jvm default 512M and most users >> don't need to increase it. Please post the code reading the object (in >> hdfs?) in your tasks. >> >> >>> >>> I found that instead of >>> serialization the object, maybe I could create a MapFile as an index to >>> permit lookups by key in Hadoop. I have also compared the performance of >>> MongoDB and Memcache. I will let you know the result after I try the >>> MapFile >>> approach. >>> >>> Shi >>> >>> On 2010-10-12 21:59, M. C. Srivas wrote: >>> >>>>> >>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<[EMAIL PROTECTED]> wrote: >>>>> >>>>> >>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> I want to load a serialized HashMap object in hadoop. The file of >>>>>> stored >>>>>> object is 200M. I could read that object efficiently in JAVA by >>>>>> setting >>>>>> >>>>>> >>>>> >>>>> -Xmx >>>>> >>>>> >>>>>> >>>>>> as 1000M. However, in hadoop I could never load it into memory. The >>>>>> code >>>>>> >>>>>> >>>>> >>>>> is >>>>> >>>>> >>>>>> >>>>>> very simple (just read the ObjectInputStream) and there is yet no >>>>>> >>>>>> >>>>> >>>>> map/reduce >>>>> >>>>> >>>>>> >>>>>> implemented. I set the mapred.child.java.opts=-Xmx3000M, still get >>>>>> the >>>>>> "java.lang.OutOfMemoryError: Java heap space" Could anyone explain a >>>>>> >>>>>> >>>>> >>>>> little >>>>> >>>>> >>>>>> >>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so >>>>>> much >>>>>> memory? If a program requires 1G memory on a single node, how much >>>>>> >>>>>> >>>>> >>>>> memory >>>>> >>>>> >>>>>> >>>>>> it requires (generally) in Hadoop? >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> The JVM reserves swap space in advance, at the time of launching the >>>> process. If your swap is too low (or do not have any swap configured), >>>> you >>>> will hit this. >>>> >>>> Or, you are on a 32-bit machine, in which case 3G is not possible in the >>>> JVM. >>>> >>>> -Srivas. >>>> >>>> >>>> >>>> >>>> >>>>>> >>>>>> Thanks. >>>>>> >>>>>> Shi >>>>>> >>>>>> -- >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> > > > -- > Postdoctoral Scholar > Institute for Genomics and Systems Biology > Department of Medicine, the University of Chicago > Knapp Center for Biomedical Discovery > 900 E. 57th St. Room 10148 > Chicago, IL 60637, US > Tel: 773-702-6799 > >
-
Re: load a serialized object in hadoopShi Yu 2010-10-13, 19:27
I haven't implemented anything in map/reduce yet for this issue. I just
try to invoke the same java class using bin/hadoop command. The thing is a very simple program could be executed in Java, but not doable in bin/hadoop command. I think if I couldn't get through the first stage, even I had a map/reduce program it would also fail. I am using Hadoop 0.19.2. Thanks. Best Regards, Shi On 2010-10-13 14:15, Luke Lu wrote: > Can you post your mapper/reducer implementation? or are you using > hadoop streaming? for which mapred.child.java.opts doesn't apply to > the jvm you care about. BTW, what's the hadoop version you're using? > > On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<[EMAIL PROTECTED]> wrote: > >> Here is my code. There is no Map/Reduce in it. I could run this code using >> java -Xmx1000m , however, when using bin/hadoop -D >> mapred.child.java.opts=-Xmx3000M it has heap space not enough error. I >> have tried other program in Hadoop with the same settings so the memory is >> available in my machines. >> >> >> public static void main(String[] args) { >> try{ >> String myFile = "xxx.dat"; >> FileInputStream fin = new FileInputStream(myFile); >> ois = new ObjectInputStream(fin); >> margintagMap = ois.readObject(); >> ois.close(); >> fin.close(); >> }catch(Exception e){ >> // >> } >> } >> >> On 2010-10-13 13:30, Luke Lu wrote: >> >>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<[EMAIL PROTECTED]> wrote: >>> >>> >>>> As a coming-up to the my own question, I think to invoke the JVM in >>>> Hadoop >>>> requires much more memory than an ordinary JVM. >>>> >>>> >>> That's simply not true. The default mapreduce task Xmx is 200M, which >>> is much smaller than the standard jvm default 512M and most users >>> don't need to increase it. Please post the code reading the object (in >>> hdfs?) in your tasks. >>> >>> >>> >>>> I found that instead of >>>> serialization the object, maybe I could create a MapFile as an index to >>>> permit lookups by key in Hadoop. I have also compared the performance of >>>> MongoDB and Memcache. I will let you know the result after I try the >>>> MapFile >>>> approach. >>>> >>>> Shi >>>> >>>> On 2010-10-12 21:59, M. C. Srivas wrote: >>>> >>>> >>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<[EMAIL PROTECTED]> wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I want to load a serialized HashMap object in hadoop. The file of >>>>>>> stored >>>>>>> object is 200M. I could read that object efficiently in JAVA by >>>>>>> setting >>>>>>> >>>>>>> >>>>>>> >>>>>> -Xmx >>>>>> >>>>>> >>>>>> >>>>>>> as 1000M. However, in hadoop I could never load it into memory. The >>>>>>> code >>>>>>> >>>>>>> >>>>>>> >>>>>> is >>>>>> >>>>>> >>>>>> >>>>>>> very simple (just read the ObjectInputStream) and there is yet no >>>>>>> >>>>>>> >>>>>>> >>>>>> map/reduce >>>>>> >>>>>> >>>>>> >>>>>>> implemented. I set the mapred.child.java.opts=-Xmx3000M, still get >>>>>>> the >>>>>>> "java.lang.OutOfMemoryError: Java heap space" Could anyone explain a >>>>>>> >>>>>>> >>>>>>> >>>>>> little >>>>>> >>>>>> >>>>>> >>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so >>>>>>> much >>>>>>> memory? If a program requires 1G memory on a single node, how much >>>>>>> >>>>>>> >>>>>>> >>>>>> memory >>>>>> >>>>>> >>>>>> >>>>>>> it requires (generally) in Hadoop? >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> The JVM reserves swap space in advance, at the time of launching the >>>>> process. If your swap is too low (or do not have any swap configured), >>>>> you >>>>> will hit this. >>>>> >>>>> Or, you are on a 32-bit machine, in which case 3G is not possible in the > Postdoctoral Scholar Institute for Genomics and Systems Biology Department of Medicine, the University of Chicago Knapp Center for Biomedical Discovery 900 E. 57th St. Room 10148 Chicago, IL 60637, US Tel: 773-702-6799
-
Re: load a serialized object in hadoopLuke Lu 2010-10-13, 20:28
On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu <[EMAIL PROTECTED]> wrote:
> I haven't implemented anything in map/reduce yet for this issue. I just try > to invoke the same java class using bin/hadoop command. The thing is a > very simple program could be executed in Java, but not doable in bin/hadoop > command. If you are just trying to use bin/hadoop jar your.jar command, your code runs in a local client jvm and mapred.child.java.opts has no effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop jar your.jar > I think if I couldn't get through the first stage, even I had a > map/reduce program it would also fail. I am using Hadoop 0.19.2. Thanks. > > Best Regards, > > Shi > > On 2010-10-13 14:15, Luke Lu wrote: >> >> Can you post your mapper/reducer implementation? or are you using >> hadoop streaming? for which mapred.child.java.opts doesn't apply to >> the jvm you care about. BTW, what's the hadoop version you're using? >> >> On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<[EMAIL PROTECTED]> wrote: >> >>> >>> Here is my code. There is no Map/Reduce in it. I could run this code >>> using >>> java -Xmx1000m , however, when using bin/hadoop -D >>> mapred.child.java.opts=-Xmx3000M it has heap space not enough error. I >>> have tried other program in Hadoop with the same settings so the memory >>> is >>> available in my machines. >>> >>> >>> public static void main(String[] args) { >>> try{ >>> String myFile = "xxx.dat"; >>> FileInputStream fin = new FileInputStream(myFile); >>> ois = new ObjectInputStream(fin); >>> margintagMap = ois.readObject(); >>> ois.close(); >>> fin.close(); >>> }catch(Exception e){ >>> // >>> } >>> } >>> >>> On 2010-10-13 13:30, Luke Lu wrote: >>> >>>> >>>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<[EMAIL PROTECTED]> wrote: >>>> >>>> >>>>> >>>>> As a coming-up to the my own question, I think to invoke the JVM in >>>>> Hadoop >>>>> requires much more memory than an ordinary JVM. >>>>> >>>>> >>>> >>>> That's simply not true. The default mapreduce task Xmx is 200M, which >>>> is much smaller than the standard jvm default 512M and most users >>>> don't need to increase it. Please post the code reading the object (in >>>> hdfs?) in your tasks. >>>> >>>> >>>> >>>>> >>>>> I found that instead of >>>>> serialization the object, maybe I could create a MapFile as an index to >>>>> permit lookups by key in Hadoop. I have also compared the performance >>>>> of >>>>> MongoDB and Memcache. I will let you know the result after I try the >>>>> MapFile >>>>> approach. >>>>> >>>>> Shi >>>>> >>>>> On 2010-10-12 21:59, M. C. Srivas wrote: >>>>> >>>>> >>>>>>> >>>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<[EMAIL PROTECTED]> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I want to load a serialized HashMap object in hadoop. The file of >>>>>>>> stored >>>>>>>> object is 200M. I could read that object efficiently in JAVA by >>>>>>>> setting >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -Xmx >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> as 1000M. However, in hadoop I could never load it into memory. The >>>>>>>> code >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> is >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> very simple (just read the ObjectInputStream) and there is yet no >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> map/reduce >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> implemented. I set the mapred.child.java.opts=-Xmx3000M, still get >>>>>>>> the >>>>>>>> "java.lang.OutOfMemoryError: Java heap space" Could anyone explain >>>>>>>> a >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> little >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so >>>>>>>> much >>>>>>>> memory? If a program requires 1G memory on a single node, how much >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> memory >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> it requires (generally) in Hadoop?
-
Re: load a serialized object in hadoopShi Yu 2010-10-13, 21:21
Hi, thanks for the advice. I tried with your settings,
$ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m still no effect. Or this is a system variable? Should I export it? How to configure it? Shi java -Xms3G -Xmx3G -classpath .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar:lib/commons-collections-3.2.1.jar:lib/stanford-postagger-2010-05-26.jar OOloadtest On 2010-10-13 15:28, Luke Lu wrote: > On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu<[EMAIL PROTECTED]> wrote: > >> I haven't implemented anything in map/reduce yet for this issue. I just try >> to invoke the same java class using bin/hadoop command. The thing is a >> very simple program could be executed in Java, but not doable in bin/hadoop >> command. >> > If you are just trying to use bin/hadoop jar your.jar command, your > code runs in a local client jvm and mapred.child.java.opts has no > effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop > jar your.jar > > >> I think if I couldn't get through the first stage, even I had a >> map/reduce program it would also fail. I am using Hadoop 0.19.2. Thanks. >> >> Best Regards, >> >> Shi >> >> On 2010-10-13 14:15, Luke Lu wrote: >> >>> Can you post your mapper/reducer implementation? or are you using >>> hadoop streaming? for which mapred.child.java.opts doesn't apply to >>> the jvm you care about. BTW, what's the hadoop version you're using? >>> >>> On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<[EMAIL PROTECTED]> wrote: >>> >>> >>>> Here is my code. There is no Map/Reduce in it. I could run this code >>>> using >>>> java -Xmx1000m , however, when using bin/hadoop -D >>>> mapred.child.java.opts=-Xmx3000M it has heap space not enough error. I >>>> have tried other program in Hadoop with the same settings so the memory >>>> is >>>> available in my machines. >>>> >>>> >>>> public static void main(String[] args) { >>>> try{ >>>> String myFile = "xxx.dat"; >>>> FileInputStream fin = new FileInputStream(myFile); >>>> ois = new ObjectInputStream(fin); >>>> margintagMap = ois.readObject(); >>>> ois.close(); >>>> fin.close(); >>>> }catch(Exception e){ >>>> // >>>> } >>>> } >>>> >>>> On 2010-10-13 13:30, Luke Lu wrote: >>>> >>>> >>>>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<[EMAIL PROTECTED]> wrote: >>>>> >>>>> >>>>> >>>>>> As a coming-up to the my own question, I think to invoke the JVM in >>>>>> Hadoop >>>>>> requires much more memory than an ordinary JVM. >>>>>> >>>>>> >>>>>> >>>>> That's simply not true. The default mapreduce task Xmx is 200M, which >>>>> is much smaller than the standard jvm default 512M and most users >>>>> don't need to increase it. Please post the code reading the object (in >>>>> hdfs?) in your tasks. >>>>> >>>>> >>>>> >>>>> >>>>>> I found that instead of >>>>>> serialization the object, maybe I could create a MapFile as an index to >>>>>> permit lookups by key in Hadoop. I have also compared the performance >>>>>> of >>>>>> MongoDB and Memcache. I will let you know the result after I try the >>>>>> MapFile >>>>>> approach. >>>>>> >>>>>> Shi >>>>>> >>>>>> On 2010-10-12 21:59, M. C. Srivas wrote: >>>>>> >>>>>> >>>>>> >>>>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<[EMAIL PROTECTED]> >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I want to load a serialized HashMap object in hadoop. The file of >>>>>>>>> stored >>>>>>>>> object is 200M. I could read that object efficiently in JAVA by >>>>>>>>> setting >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> -Xmx >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> as 1000M. However, in hadoop I could never load it into memory. The >>>>>>>>> code >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> is >
-
Re: load a serialized object in hadoopKonstantin Boudnik 2010-10-13, 21:26
You should have no space here "-D HADOOP_CLIENT_OPTS"
On Wed, Oct 13, 2010 at 04:21PM, Shi Yu wrote: > Hi, thanks for the advice. I tried with your settings, > $ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m > > still no effect. Or this is a system variable? Should I export it? > How to configure it? > > Shi > > java -Xms3G -Xmx3G -classpath .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar:lib/commons-collections-3.2.1.jar:lib/stanford-postagger-2010-05-26.jar > OOloadtest > > > On 2010-10-13 15:28, Luke Lu wrote: > >On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu<[EMAIL PROTECTED]> wrote: > >>I haven't implemented anything in map/reduce yet for this issue. I just try > >>to invoke the same java class using bin/hadoop command. The thing is a > >>very simple program could be executed in Java, but not doable in bin/hadoop > >>command. > >If you are just trying to use bin/hadoop jar your.jar command, your > >code runs in a local client jvm and mapred.child.java.opts has no > >effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop > >jar your.jar > > > >>I think if I couldn't get through the first stage, even I had a > >>map/reduce program it would also fail. I am using Hadoop 0.19.2. Thanks. > >> > >>Best Regards, > >> > >>Shi > >> > >>On 2010-10-13 14:15, Luke Lu wrote: > >>>Can you post your mapper/reducer implementation? or are you using > >>>hadoop streaming? for which mapred.child.java.opts doesn't apply to > >>>the jvm you care about. BTW, what's the hadoop version you're using? > >>> > >>>On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<[EMAIL PROTECTED]> wrote: > >>> > >>>>Here is my code. There is no Map/Reduce in it. I could run this code > >>>>using > >>>>java -Xmx1000m , however, when using bin/hadoop -D > >>>>mapred.child.java.opts=-Xmx3000M it has heap space not enough error. I > >>>>have tried other program in Hadoop with the same settings so the memory > >>>>is > >>>>available in my machines. > >>>> > >>>> > >>>>public static void main(String[] args) { > >>>> try{ > >>>> String myFile = "xxx.dat"; > >>>> FileInputStream fin = new FileInputStream(myFile); > >>>> ois = new ObjectInputStream(fin); > >>>> margintagMap = ois.readObject(); > >>>> ois.close(); > >>>> fin.close(); > >>>> }catch(Exception e){ > >>>> // > >>>> } > >>>>} > >>>> > >>>>On 2010-10-13 13:30, Luke Lu wrote: > >>>> > >>>>>On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<[EMAIL PROTECTED]> wrote: > >>>>> > >>>>> > >>>>>>As a coming-up to the my own question, I think to invoke the JVM in > >>>>>>Hadoop > >>>>>>requires much more memory than an ordinary JVM. > >>>>>> > >>>>>> > >>>>>That's simply not true. The default mapreduce task Xmx is 200M, which > >>>>>is much smaller than the standard jvm default 512M and most users > >>>>>don't need to increase it. Please post the code reading the object (in > >>>>>hdfs?) in your tasks. > >>>>> > >>>>> > >>>>> > >>>>>>I found that instead of > >>>>>>serialization the object, maybe I could create a MapFile as an index to > >>>>>>permit lookups by key in Hadoop. I have also compared the performance > >>>>>>of > >>>>>>MongoDB and Memcache. I will let you know the result after I try the > >>>>>>MapFile > >>>>>>approach. > >>>>>> > >>>>>>Shi > >>>>>> > >>>>>>On 2010-10-12 21:59, M. C. Srivas wrote: > >>>>>> > >>>>>> > >>>>>>>>On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<[EMAIL PROTECTED]> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>Hi, > >>>>>>>>> > >>>>>>>>>I want to load a serialized HashMap object in hadoop. The file of > >>>>>>>>>stored > >>>>>>>>>object is 200M. I could read that object efficiently in JAVA by > >>>>>>>>>setting > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>-Xmx > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>as 1000M. However, in hadoop I could never load it into memory. The > >>>>>>>>>code > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >
-
Re: load a serialized object in hadoopLuke Lu 2010-10-13, 21:28
On Wed, Oct 13, 2010 at 2:21 PM, Shi Yu <[EMAIL PROTECTED]> wrote:
> Hi, thanks for the advice. I tried with your settings, > $ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m > still no effect. Or this is a system variable? Should I export it? How to > configure it? HADOOP_CLIENT_OPTS is an environment variable so you should run it as HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop jar Test.jar OOloadtest if you use sh derivative shells (bash, ksh etc.) prepend env for other shells. __Luke > Shi > > java -Xms3G -Xmx3G -classpath > .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar:lib/commons-collections-3.2.1.jar:lib/stanford-postagger-2010-05-26.jar > OOloadtest > > > On 2010-10-13 15:28, Luke Lu wrote: >> >> On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu<[EMAIL PROTECTED]> wrote: >> >>> >>> I haven't implemented anything in map/reduce yet for this issue. I just >>> try >>> to invoke the same java class using bin/hadoop command. The thing is >>> a >>> very simple program could be executed in Java, but not doable in >>> bin/hadoop >>> command. >>> >> >> If you are just trying to use bin/hadoop jar your.jar command, your >> code runs in a local client jvm and mapred.child.java.opts has no >> effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop >> jar your.jar >> >> >>> >>> I think if I couldn't get through the first stage, even I had a >>> map/reduce program it would also fail. I am using Hadoop 0.19.2. Thanks. >>> >>> Best Regards, >>> >>> Shi >>> >>> On 2010-10-13 14:15, Luke Lu wrote: >>> >>>> >>>> Can you post your mapper/reducer implementation? or are you using >>>> hadoop streaming? for which mapred.child.java.opts doesn't apply to >>>> the jvm you care about. BTW, what's the hadoop version you're using? >>>> >>>> On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<[EMAIL PROTECTED]> wrote: >>>> >>>> >>>>> >>>>> Here is my code. There is no Map/Reduce in it. I could run this code >>>>> using >>>>> java -Xmx1000m , however, when using bin/hadoop -D >>>>> mapred.child.java.opts=-Xmx3000M it has heap space not enough error. >>>>> I >>>>> have tried other program in Hadoop with the same settings so the memory >>>>> is >>>>> available in my machines. >>>>> >>>>> >>>>> public static void main(String[] args) { >>>>> try{ >>>>> String myFile = "xxx.dat"; >>>>> FileInputStream fin = new FileInputStream(myFile); >>>>> ois = new ObjectInputStream(fin); >>>>> margintagMap = ois.readObject(); >>>>> ois.close(); >>>>> fin.close(); >>>>> }catch(Exception e){ >>>>> // >>>>> } >>>>> } >>>>> >>>>> On 2010-10-13 13:30, Luke Lu wrote: >>>>> >>>>> >>>>>> >>>>>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<[EMAIL PROTECTED]> >>>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> As a coming-up to the my own question, I think to invoke the JVM in >>>>>>> Hadoop >>>>>>> requires much more memory than an ordinary JVM. >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> That's simply not true. The default mapreduce task Xmx is 200M, which >>>>>> is much smaller than the standard jvm default 512M and most users >>>>>> don't need to increase it. Please post the code reading the object (in >>>>>> hdfs?) in your tasks. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> I found that instead of >>>>>>> serialization the object, maybe I could create a MapFile as an index >>>>>>> to >>>>>>> permit lookups by key in Hadoop. I have also compared the performance >>>>>>> of >>>>>>> MongoDB and Memcache. I will let you know the result after I try the >>>>>>> MapFile >>>>>>> approach. >>>>>>> >>>>>>> Shi >>>>>>> >>>>>>> On 2010-10-12 21:59, M. C. Srivas wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<[EMAIL PROTECTED]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I want to load a serialized HashMap object in hadoop. The file of >>>>>>>>>> stored
-
Re: load a serialized object in hadoopShi Yu 2010-10-13, 23:18
Hi, I tried the following five ways:
Approach 1: in command line HADOOP_CLIENT_OPTS=-Xmx4000m bin/hadoop jar WordCount.jar OOloadtest Approach 2: I added the hadoop-site.xml file with the following element. Each time I changed, I stop and restart hadoop on all the nodes. ... <property> <name>HADOOP_CLIENT_OPTS</name> <value>-Xmx4000m</value> </property> run the command $bin/hadoop jar WordCount.jar OOloadtest Approach 3: I changed like this ... <property> <name>HADOOP_CLIENT_OPTS</name> <value>4000m</value> </property> .... Then run the command: $bin/hadoop jar WordCount.jar OOloadtest Approach 4: To make sure, I changed the "m" to numbers, that was ... <property> <name>HADOOP_CLIENT_OPTS</name> <value>4000000000</value> </property> .... Then run the command: $bin/hadoop jar WordCount.jar OOloadtest All these four approaches come to the same "Java heap space" error. java.lang.OutOfMemoryError: Java heap space at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:45) at java.lang.StringBuilder.<init>(StringBuilder.java:68) at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:2997) at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2818) at java.io.ObjectInputStream.readString(ObjectInputStream.java:1599) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1320) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) at java.util.HashMap.readObject(HashMap.java:1028) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1846) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) at ObjectManager.loadObject(ObjectManager.java:42) at OOloadtest.main(OOloadtest.java:21) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:165) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) Approach 5: In comparison, I called the Java command directly as follows (there is a counter showing how much time it costs if the serialized object is successfully loaded): $java -Xms3G -Xmx3G -classpath .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar OOloadtest return: object loaded, timing (hms): 0 hour(s) 1 minute(s) 12 second(s) 162millisecond(s) What was the problem in my command? Where can I find the documentation about HADOOP_CLIENT_OPTS? Have you tried the same thing and found it works? Shi On 2010-10-13 16:28, Luke Lu wrote: > On Wed, Oct 13, 2010 at 2:21 PM, Shi Yu<[EMAIL PROTECTED]> wrote: > >> Hi, thanks for the advice. I tried with your settings, >> $ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m >> still no effect. Or this is a system variable? Should I export it? How to >> configure it? >> > HADOOP_CLIENT_OPTS is an environment variable so you should run it as
-
Re: load a serialized object in hadoopBharath Mundlapudi 2010-10-14, 00:03
If you are running 32 bit JVM and depending on OS, you can't go beyond ~ 3GB. If you are on windows, 2GB heap is your best bet for 32bit JVM.
Try this: Edit conf/hadoop-env.sh for export HADOOP_CLIENT_OPTS="-Xmx3G ${HADOOP_CLIENT_OPTS}" and now you run your hadoop commands. -Bharath From: Shi Yu <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: Sent: Wednesday, October 13, 2010 4:18:17 PM Subject: Re: load a serialized object in hadoop Hi, I tried the following five ways: Approach 1: in command line HADOOP_CLIENT_OPTS=-Xmx4000m bin/hadoop jar WordCount.jar OOloadtest Approach 2: I added the hadoop-site.xml file with the following element. Each time I changed, I stop and restart hadoop on all the nodes. ... <property> <name>HADOOP_CLIENT_OPTS</name> <value>-Xmx4000m</value> </property> run the command $bin/hadoop jar WordCount.jar OOloadtest Approach 3: I changed like this ... <property> <name>HADOOP_CLIENT_OPTS</name> <value>4000m</value> </property> .... Then run the command: $bin/hadoop jar WordCount.jar OOloadtest Approach 4: To make sure, I changed the "m" to numbers, that was ... <property> <name>HADOOP_CLIENT_OPTS</name> <value>4000000000</value> </property> .... Then run the command: $bin/hadoop jar WordCount.jar OOloadtest All these four approaches come to the same "Java heap space" error. java.lang.OutOfMemoryError: Java heap space at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:45) at java.lang.StringBuilder.<init>(StringBuilder.java:68) at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:2997) at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2818) at java.io.ObjectInputStream.readString(ObjectInputStream.java:1599) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1320) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) at java.util.HashMap.readObject(HashMap.java:1028) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1846) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) at ObjectManager.loadObject(ObjectManager.java:42) at OOloadtest.main(OOloadtest.java:21) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:165) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) Approach 5: In comparison, I called the Java command directly as follows (there is a counter showing how much time it costs if the serialized object is successfully loaded): $java -Xms3G -Xmx3G -classpath .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar OOloadtest return: object loaded, timing (hms): 0 hour(s) 1 minute(s) 12 second(s) 162millisecond(s) What was the problem in my command? Where can I find the documentation about HADOOP_CLIENT_OPTS? Have you tried the same thing and found it works? Shi On 2010-10-13 16:28, Luke Lu wrote:
-
Re: load a serialized object in hadoopLuke Lu 2010-10-14, 00:04
Just took a look at the bin/hadoop of your particular version
(http://svn.apache.org/viewvc/hadoop/common/tags/release-0.19.2/bin/hadoop?revision=796970&view=markup). It looks like that HADOOP_CLIENT_OPTS doesn't work with the jar command, which is fixed in later version. So try HADOOP_OPTS=-Xmx1000M bin/hadoop ... instead. It would work because it just translates to the same java command line that worked for you :) __Luke On Wed, Oct 13, 2010 at 4:18 PM, Shi Yu <[EMAIL PROTECTED]> wrote: > Hi, I tried the following five ways: > > Approach 1: in command line > HADOOP_CLIENT_OPTS=-Xmx4000m bin/hadoop jar WordCount.jar OOloadtest > > > Approach 2: I added the hadoop-site.xml file with the following element. > Each time I changed, I stop and restart hadoop on all the nodes. > ... > <property> > <name>HADOOP_CLIENT_OPTS</name> > <value>-Xmx4000m</value> > </property> > > run the command > $bin/hadoop jar WordCount.jar OOloadtest > > Approach 3: I changed like this > ... > <property> > <name>HADOOP_CLIENT_OPTS</name> > <value>4000m</value> > </property> > .... > > Then run the command: > $bin/hadoop jar WordCount.jar OOloadtest > > Approach 4: To make sure, I changed the "m" to numbers, that was > ... > <property> > <name>HADOOP_CLIENT_OPTS</name> > <value>4000000000</value> > </property> > .... > > Then run the command: > $bin/hadoop jar WordCount.jar OOloadtest > > All these four approaches come to the same "Java heap space" error. > > java.lang.OutOfMemoryError: Java heap space > at > java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:45) > at java.lang.StringBuilder.<init>(StringBuilder.java:68) > at > java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:2997) > at > java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2818) > at java.io.ObjectInputStream.readString(ObjectInputStream.java:1599) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1320) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) > at java.util.HashMap.readObject(HashMap.java:1028) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1846) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) > at ObjectManager.loadObject(ObjectManager.java:42) > at OOloadtest.main(OOloadtest.java:21) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:165) > at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) > > > Approach 5: > In comparison, I called the Java command directly as follows (there is a > counter showing how much time it costs if the serialized object is > successfully loaded): > > $java -Xms3G -Xmx3G -classpath > .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar OOloadtest > > return: > object loaded, timing (hms): 0 hour(s) 1 minute(s) 12 second(s)
-
Re: load a serialized object in hadoopShi Yu 2010-10-14, 00:13
Hi, I got it, it should be declared in the
enhadoop-env.sh export HADOOP_CLIENT_OPTS=-Xmx4000m Thanks! At the same time I see corrections come in. Shi On 2010-10-13 18:18, Shi Yu wrote: > Hi, I tried the following five ways: > > Approach 1: in command line > HADOOP_CLIENT_OPTS=-Xmx4000m bin/hadoop jar WordCount.jar OOloadtest > > > Approach 2: I added the hadoop-site.xml file with the following > element. Each time I changed, I stop and restart hadoop on all the nodes. > ... > <property> > <name>HADOOP_CLIENT_OPTS</name> > <value>-Xmx4000m</value> > </property> > > run the command > $bin/hadoop jar WordCount.jar OOloadtest > > Approach 3: I changed like this > ... > <property> > <name>HADOOP_CLIENT_OPTS</name> > <value>4000m</value> > </property> > .... > > Then run the command: > $bin/hadoop jar WordCount.jar OOloadtest > > Approach 4: To make sure, I changed the "m" to numbers, that was > ... > <property> > <name>HADOOP_CLIENT_OPTS</name> > <value>4000000000</value> > </property> > .... > > Then run the command: > $bin/hadoop jar WordCount.jar OOloadtest > > All these four approaches come to the same "Java heap space" error. > > java.lang.OutOfMemoryError: Java heap space > at > java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:45) > at java.lang.StringBuilder.<init>(StringBuilder.java:68) > at > java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:2997) > > at > java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2818) > > at > java.io.ObjectInputStream.readString(ObjectInputStream.java:1599) > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1320) > at > java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) > at java.util.HashMap.readObject(HashMap.java:1028) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > at > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1846) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753) > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329) > at > java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) > at ObjectManager.loadObject(ObjectManager.java:42) > at OOloadtest.main(OOloadtest.java:21) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:165) > at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) > > > Approach 5: > In comparison, I called the Java command directly as follows (there is > a counter showing how much time it costs if the serialized object is > successfully loaded): > > $java -Xms3G -Xmx3G -classpath > .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar OOloadtest > > return: > object loaded, timing (hms): 0 hour(s) 1 minute(s) 12 second(s) > 162millisecond(s) > > > What was the problem in my command? Where can I find the documentation > about HADOOP_CLIENT_OPTS? Have you tried the same thing and found it > works? > > Shi > > > On 2010-10-13 16:28, Luke Lu wrote:
-
Re: load a serialized object in hadoopShi Yu 2010-10-14, 00:31
Thanks. Well I set the value to 3000M in hadoop-env.sh so it has the
same configuration as Java export HADOOP_HEAPSIZE=3000 export HADOOP_CLIENT_OPTS=3000 Then I did the comparison: sheeyu@ocuic3:~/hadoop/hadoop-0.19.2$ bin/hadoop jar WordCount.jar OOloadtest timing (hms): 0 hour(s) 2 minute(s) 53 second(s) 599millisecond(s) sheeyu@ocuic3:~/hadoop/hadoop-0.19.2$ java -Xms3G -Xmx3G -classpath .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar OOloadtest timing (hms): 0 hour(s) 1 minute(s) 14 second(s) 7millisecond(s) It seems the hadoop command is 50% slower. I guess that was because I didn't set the "initial heap size" correctly (correspond to the -Xms configuration in Java). I tried like this sheeyu@ocuic3:~/hadoop/hadoop-0.19.2$ bin/hadoop jar WordCount.jar OOloadtest -D mapred.child.java.opts=-Xmx3000M timing (hms): 0 hour(s) 3 minute(s) 0 second(s) 343millisecond(s) I also tried HADOOP_OPTS=-Xmx3000M bin/hadoop jar WordCount.jar OOloadtest timing (hms): 0 hour(s) 3 minute(s) 7 second(s) 774millisecond(s) Now it works! So how to set the initial heap size? HADOOP_NAMENODE_OPTS or HADOOP_CLIENT_OPTS ? Because there is 50% difference in speed. Shi On 2010-10-13 19:04, Luke Lu wrote: > Just took a look at the bin/hadoop of your particular version > (http://svn.apache.org/viewvc/hadoop/common/tags/release-0.19.2/bin/hadoop?revision=796970&view=markup). > It looks like that HADOOP_CLIENT_OPTS doesn't work with the jar > command, which is fixed in later version. > > So try HADOOP_OPTS=-Xmx1000M bin/hadoop ... instead. It would work > because it just translates to the same java command line that worked > for you :) > > __Luke > > On Wed, Oct 13, 2010 at 4:18 PM, Shi Yu<[EMAIL PROTECTED]> wrote: > >> Hi, I tried the following five ways: >> >> Approach 1: in command line >> HADOOP_CLIENT_OPTS=-Xmx4000m bin/hadoop jar WordCount.jar OOloadtest >> >> >> Approach 2: I added the hadoop-site.xml file with the following element. >> Each time I changed, I stop and restart hadoop on all the nodes. >> ... >> <property> >> <name>HADOOP_CLIENT_OPTS</name> >> <value>-Xmx4000m</value> >> </property> >> >> run the command >> $bin/hadoop jar WordCount.jar OOloadtest >> >> Approach 3: I changed like this >> ... >> <property> >> <name>HADOOP_CLIENT_OPTS</name> >> <value>4000m</value> >> </property> >> .... >> >> Then run the command: >> $bin/hadoop jar WordCount.jar OOloadtest >> >> Approach 4: To make sure, I changed the "m" to numbers, that was >> ... >> <property> >> <name>HADOOP_CLIENT_OPTS</name> >> <value>4000000000</value> >> </property> >> .... >> >> Then run the command: >> $bin/hadoop jar WordCount.jar OOloadtest >> >> All these four approaches come to the same "Java heap space" error. >> >> java.lang.OutOfMemoryError: Java heap space >> at >> java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:45) >> at java.lang.StringBuilder.<init>(StringBuilder.java:68) >> at >> java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:2997) >> at >> java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2818) >> at java.io.ObjectInputStream.readString(ObjectInputStream.java:1599) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1320) >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) >> at java.util.HashMap.readObject(HashMap.java:1028) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at >> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974) >> at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1846) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
-
Re: load a serialized object in hadoopShi Yu 2010-10-14, 02:02
Just a remind and a warning, changing the
HADOOP_CLIENT_OPTS HADOOP_OPTS items in the hadoop-env.sh improperly may cause Hadoop to crash. |