Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Memory distribution for Hadoop/Hbase processes


Copy link to this message
-
Re: Memory distribution for Hadoop/Hbase processes
You are way underpowered. I don't think you are going to get reasonable performance out of this hardware with so many processes running on it (specially memory heavy processes like HBase), obviously severity depends on your use case

I would say you can decrease memory allocation to namenode/datanodes/secondary namenode/hbase master/zookeeper and increase allocation to region servers
 
Regards,
Dhaval
________________________________
 From: Vimal Jain <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Wednesday, 7 August 2013 12:47 PM
Subject: Re: Memory distribution for Hadoop/Hbase processes
 

Hi Ted,
I am using centOS.
I could not get output of "ps aux | grep pid" as currently the hbase/hadoop
is down in production due to some internal reasons.

Can you please help me in figuring out memory distribution for my single
node cluster ( pseudo-distributed mode)  ?
Currently its just 4GB  RAM .Also i can try and  make it up to 6 GB.
So i have come up with following distribution :-

Name node - 512 MB
Data node - 1024MB
Secondary Name node - 512 MB

HMaster - 512 MB
HRegion - 2048 MB
Zookeeper - 512 MB

So total memory allocation is 5 GB and i still have 1 GB left for OS.

1) So is it fine  to go ahead with this configuration in production ? ( I
am asking this because i had "long GC pause"  problems in past when i did
not change JVM memory allocation configuration in hbase-env.sh and
hadoop-env.sh so it was taking default values . i.e. 1 GB for each of the 6
process so total allocation was 6 GB and i had only 4 GB of RAM. After this
i just assigned 1.5 GB to HRegion and 512 MB each to HMaster and Zookeeper
. I forgot to change it for Hadoop processes.Also i changed kernel
parameter vm.swappiness to 0. After this , it was working fine).

2) Currently i am running pseudo-distributed mode as my data size is at max
10-15GB at present.How easy it is to migrate from pseudo-distributed mode
to Fully distributed mode in future if my data size increases ? ( which
will be the case for sure ) .

Thanks for your help . Really appreciate it .
On Sun, Aug 4, 2013 at 8:12 PM, Kevin O'dell <[EMAIL PROTECTED]>wrote:

> My questions are :
> 1) How this thing is working ? It is working because java can over allocate
> memory. You will know you are using too much memory when the kernel starts
> killing processes.
> 2) I just have one table whose size at present is about 10-15 GB , so what
> should be ideal memory distribution ? Really you should get a box with more
> memory. You can currently only hold about ~400 MB in memory.
> On Aug 4, 2013 9:58 AM, "Ted Yu" <[EMAIL PROTECTED]> wrote:
>
> > What OS are you using ?
> >
> > What is the output from the following command ?
> >  ps aux | grep pid
> > where pid is the process Id for Namenode, Datanode, etc.
> >
> > Cheers
> >
> > On Sun, Aug 4, 2013 at 6:33 AM, Vimal Jain <[EMAIL PROTECTED]> wrote:
> >
> > > Hi,
> > > I have configured Hbase in pseudo distributed mode with HDFS as
> > underlying
> > > storage.I am not using map reduce framework as of now
> > > I have 4GB RAM.
> > > Currently i have following distribution of memory
> > >
> > > Data Node,Name Node,Secondary Name Node each :1000MB(default
> > > HADOOP_HEAPSIZE
> > > property)
> > >
> > > Hmaster - 512 MB
> > > HRegion - 1536 MB
> > > Zookeeper - 512 MB
> > >
> > > So total heap allocation becomes - 5.5 GB which is absurd as my total
> RAM
> > > is only 4 GB , but still the setup is working fine on production. :-0
> > >
> > > My questions are :
> > > 1) How this thing is working ?
> > > 2) I just have one table whose size at present is about 10-15 GB , so
> > what
> > > should be ideal memory distribution ?
> > > --
> > > Thanks and Regards,
> > > Vimal Jain
> > >
> >
>

--
Thanks and Regards,
Vimal Jain