Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> HBase M/R with M/R and HBase not on same cluster


+
David Koch 2013-03-18, 22:09
+
Jean-Daniel Cryans 2013-03-18, 22:26
+
David Koch 2013-03-24, 12:49
+
Michael Segel 2013-03-25, 16:12
Copy link to this message
-
Re: HBase M/R with M/R and HBase not on same cluster
Hi Michael,

The reason is that cluster B is a production environment with jobs running
on it non-stop. I do not want to take ressources away from it. Secondly,
the "destination" cluster A is a much less powerful test environment, thus,
even when running the job on B - the slow HBase sink on cluster A would be
a bottleneck.

What I did in the end was run a regular job on cluster A with input path
set to a file on cluster B.

/David

On Mon, Mar 25, 2013 at 5:12 PM, Michael Segel <[EMAIL PROTECTED]>wrote:

> Just out of curiosity...
>
> Why do you want to run the job on Cluster A that reads from Cluster B but
> writes to Cluster A?
>
> Wouldn't it be easier to run the job on Cluster B and inside the
> Mapper.setup() you create your own configuration for your second cluster
> for output?
>
>
> On Mar 24, 2013, at 7:49 AM, David Koch <[EMAIL PROTECTED]> wrote:
>
> > Hello J-D,
> >
> > Thanks, it was instructive to look at the source. However, I am now stuck
> > with getting HBase to honor the "hbase.mapred.output.quorum" setting. I
> > opened a separate topic for this.
> >
> > Regards,
> >
> > /David
> >
> >
> > On Mon, Mar 18, 2013 at 11:26 PM, Jean-Daniel Cryans <
> [EMAIL PROTECTED]>wrote:
> >
> >> Checkout how CopyTable does it:
> >>
> >>
> https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/CopyTable.java
> >>
> >> J-D
> >>
> >> On Mon, Mar 18, 2013 at 3:09 PM, David Koch <[EMAIL PROTECTED]>
> wrote:
> >>> Hello,
> >>>
> >>> Is it possible to run a M/R on cluster A over a table that resides on
> >>> cluster B with output to a table on cluster A? If so, how?
> >>>
> >>> I am interested in doing this for the purpose of copying part of a
> table
> >>> from B to A. Cluster B is a production environment, cluster A is a slow
> >>> test platform. I do not want the M/R to run on B since it would block
> >>> precious slots on this cluster. Otherwise I could just run CopyTable on
> >>> cluster B and specify cluster A as output quorum.
> >>>
> >>> Could this work by pointing the client configuration at the
> >> mapred-site.xml
> >>> of cluster A and the hdfs-site.xml and hbase-site.xml of cluster B? In
> >> this
> >>> scenario - in order to output to cluster A I guess I'd have to set
> >>> TableOutputFormat.QUORUM_ADDRESS to cluster A.
> >>>
> >>> I use a client configuration generated by CDH4 and there are some other
> >>> files floating around - such as core-site.xml, not sure what to do with
> >>> that.
> >>>
> >>> Thank you,
> >>>
> >>> /David
> >>
>
>