Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Bulk loading job failed when one region server went down in the cluster

Copy link to this message
Re: Bulk loading job failed when one region server went down in the cluster

I don't know if you can call it a bug if you don't have enough memory available.

I mean if you don't use HBase, then you may have more leeway in terms of swap.

You can also do more tuning of HBase to handle the additional latency found in a Virtual environment.

Why don't you rebuild your vm's to be slightly larger in terms of memory?
On Aug 13, 2012, at 8:05 PM, anil gupta <[EMAIL PROTECTED]> wrote:

> Hi Mike,
> You hit the nail on the that i need to lower down the memory by setting
> yarn.nodemanager.resource.memory-mb. Here's another major bug of YARN you
> are talking about. I already tried setting that property to 1500 MB in
> yarn-site.xml and  setting yarn.app.mapreduce.am.resource.mb to 1000 MB in
> mapred-site.xml. If i do this change then the YARN job does not runs at all
> even though the configuration is right. It's a bug and i have to file a
> JIRA for it. So, i was only left with the option to let it run with
> incorrect YARN conf since my objective is to load data into HBase rather
> than playing with YARN. MapReduce is only used for bulk loading in my
> cluster.
> Here is a link to the mailing list email regarding running YARN with lesser
> memory:
> http://permalink.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/33164
> It would be great if you can answer this simple question of mine: Is HBase
> Bulk Loading fault tolerant to Region Server failures in a viable/decent
> environment?
> Thanks,
> Anil Gupta
> On Mon, Aug 13, 2012 at 5:17 PM, Michael Segel <[EMAIL PROTECTED]>wrote:
>> Not sure why you're having an issue in getting an answer.
>> Even if you're not a YARN expert,  google is your friend.
>> See:
>> http://books.google.com/books?id=Wu_xeGdU4G8C&pg=PA323&lpg=PA323&dq=Hadoop+YARN+setting+number+of+slots&source=bl&ots=i7xQYwQf-u&sig=ceuDmiOkbqTqok_HfIr3udvm6C0&hl=en&sa=X&ei=8JYpUNeZJMnxygGzqIGwCw&ved=0CEQQ6AEwAQ#v=onepage&q=Hadoop%20YARN%20setting%20number%20of%20slots&f=false
>> This is a web page from Tom White's 3rd Edition.
>> The bottom line...
>> -=-
>> The considerations for how much memory to dedicate to a node manager for
>> running containers are similar to the those discussed in
>> “Memory” on page 307. Each Hadoop daemon uses 1,000 MB, so for a datanode
>> and a node manager, the total is 2,000 MB. Set aside enough for other
>> processes that are running on the machine, and the remainder can be
>> dedicated to the node manager’s containers by setting the configuration
>> property yarn.nodemanager.resource.memory-mb to the total allocation in MB.
>> (The default is 8,192 MB.)
>> -=-
>> Taken per fair use. Page 323
>> As you can see you need to drop this down to something like 1GB if you
>> even have enough memory for that.
>> Again set yarn.nodemanager.resource.memory-mb to a more realistic value.
>> 8GB on a 3 GB node? Yeah that would really hose you, especially if you're
>> trying to run HBase too.
>> Even here... You really don't have enough memory to do it all. (Maybe
>> enough to do a small test)
>> Good luck.
>> On Aug 13, 2012, at 3:24 PM, anil gupta <[EMAIL PROTECTED]> wrote:
>>> Hi Mike,
>>> Here is the link to my email on Hadoop list regarding YARN problem:
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201208.mbox/%3CCAF1+Vs8oF4VsHbg14B7SGzBB_8Ty7GC9Lw3nm1bM0v+[EMAIL PROTECTED]%3E
>>> Somehow the link for cloudera mail in last email does not seems to work.
>>> Here is the new link:
>> https://groups.google.com/a/cloudera.org/forum/?fromgroups#!searchin/cdh-user/yarn$20anil/cdh-user/J564g9A8tPE/ZpslzOkIGZYJ%5B1-25%5D
>>> Thanks for your help,
>>> Anil Gupta
>>> On Mon, Aug 13, 2012 at 1:14 PM, anil gupta <[EMAIL PROTECTED]>
>> wrote:
>>>> Hi Mike,
>>>> I tried doing that by setting up properties in mapred-site.xml but Yarn
>>>> doesnt seems to work with "mapreduce.tasktracker.
>>>> map.tasks.maximum" property. Here is a reference to a discussion to same