Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Bulk loading job failed when one region server went down in the cluster


Copy link to this message
-
Re: Bulk loading job failed when one region server went down in the cluster
anil gupta 2012-08-15, 22:13
Hi Stack,

Thanks for answering my question.  I admit that i am unable to run
MR2(YARN) job in an efficient way on my cluster due to a major bug in YARN
which is not letting me set the right configuration for MapReduce jobs.

The RS's are dying with LeaseExpiredExceptions or YouAreDeadException
because of overload on the slaves due to improper YARN conf . Once the MR
job finishes then HBase performance is OK. I am not using this cluster for
performance metrics because we wont be using virtualization in our
production.

My purpose of this email post was to know whether Bulk Loading is fault
tolerant to RS failures or not. You answer is sufficient for clearing my
doubts.

Thanks,
Anil

On Wed, Aug 15, 2012 at 2:52 PM, Stack <[EMAIL PROTECTED]> wrote:

> On Mon, Aug 13, 2012 at 6:05 PM, anil gupta <[EMAIL PROTECTED]> wrote:
> > It would be great if you can answer this simple question of mine: Is
> HBase
> > Bulk Loading fault tolerant to Region Server failures in a viable/decent
> > environment?
> >
>
> Bulk Loading is an MapReduce job.  Bulk Loading is as 'fault tolerant'
> as MapReduce is (MapReduce jobs have long timeouts -- ten minutes IIRC
> -- and tasks are retried up to a maximum, 4 by default, but if after
> all timeouts and retries have expired, the job will fail).
>
> You have RSs failing, maybe because you have too many slots allocated
> to MapReduce for the hardware you are using to PoC (as Michael Segel
> suggests).  Maybe the MR task is not finding the region's new
> locations in time or maybe the regions are not coming back on line in
> time for the MR job to complete?
>
> The logs you provide for the MR task show us failing to go against a
> RS who has died but doesn't know it yet (the YouAreDeadException).
> Try looking at the subsequent map tasks that fail.  Why are they
> failing?  For same reason?  Look in the master log to see whats
> happening around log splitting of the failed server?  Is it hung up
> preventing the regions being assigned to new locations?
>
> St.Ack
>

--
Thanks & Regards,
Anil Gupta