Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Shared HDFS for HBase and MapReduce


Copy link to this message
-
RE: Shared HDFS for HBase and MapReduce
Vladimir Rodionov 2012-06-06, 17:49
Sure,  limiting number of slots is a way of IO throttling for MR jobs
If you can do this - go ahead and do this.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: [EMAIL PROTECTED]

________________________________________
From: Mathias Herberts [[EMAIL PROTECTED]]
Sent: Wednesday, June 06, 2012 12:19 AM
To: [EMAIL PROTECTED]
Subject: RE: Shared HDFS for HBase and MapReduce

We run M/R jobs that query HBase in a pool with a limited number of mapper
slots, works like a charm to have both RT and batch queries on HBase
On Jun 6, 2012 6:23 AM, "Vladimir Rodionov" <[EMAIL PROTECTED]> wrote:

> You can share HBase and MR if you run MR jobs only to process data off
> HBase and do not use HBase for real-time queries
> It is not generally advisable to share live (real-time) HBase cluster and
> run MR jobs at the same time as since HDFS can get easily saturated
> by MR jobs and you will have much worse HBase query latency and overall
> query throughput.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: [EMAIL PROTECTED]
>
> ________________________________________
> From: [EMAIL PROTECTED] [[EMAIL PROTECTED]] On Behalf Of Stack [
> [EMAIL PROTECTED]]
> Sent: Tuesday, June 05, 2012 9:07 PM
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: Re: Shared HDFS for HBase and MapReduce
>
> On Tue, Jun 5, 2012 at 8:29 PM, Atif Khan <[EMAIL PROTECTED]>
> wrote:
> > My first thoughts were to create a single HDFS cluster, and then point
> the
> > MapReduce and HBase servers to use the common HDFS installation.
>  However,
> > Cloudera's Dos and Don'ts page
> > (http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/) insists that
> > MapReduce and HBase should not share an HDFS cluster.  Rather they should
> > have their own individual clusters.  I don't understand this
> recommendation,
> > as it would result in moving data around from one HDFS cluster to another
> > when running MapReduce over HBase.
> >
>
> It starts out "Be careful when running mixed workloads on an HBase
> cluster."  Does your use case fit the case described: "...SLAs on
> hbase access" and at the same time running heavy mapreduce jobs on
> same cluster?  If so, you may want to do the suggested two clusters.
>
> I'd suggest you start w/ all on the one cluster and see how you do.
> That post is > a year old.  HBase has gotten steadily better since.
>
> St.Ack
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or [EMAIL PROTECTED] and
> delete or destroy any copy of this message and its attachments.
>

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments.