Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase M/R with M/R and HBase not on same cluster


Copy link to this message
-
Re: HBase M/R with M/R and HBase not on same cluster
Hi Michael,

The reason is that cluster B is a production environment with jobs running
on it non-stop. I do not want to take ressources away from it. Secondly,
the "destination" cluster A is a much less powerful test environment, thus,
even when running the job on B - the slow HBase sink on cluster A would be
a bottleneck.

What I did in the end was run a regular job on cluster A with input path
set to a file on cluster B.

/David

On Mon, Mar 25, 2013 at 5:12 PM, Michael Segel <[EMAIL PROTECTED]>wrote:

> Just out of curiosity...
>
> Why do you want to run the job on Cluster A that reads from Cluster B but
> writes to Cluster A?
>
> Wouldn't it be easier to run the job on Cluster B and inside the
> Mapper.setup() you create your own configuration for your second cluster
> for output?
>
>
> On Mar 24, 2013, at 7:49 AM, David Koch <[EMAIL PROTECTED]> wrote:
>
> > Hello J-D,
> >
> > Thanks, it was instructive to look at the source. However, I am now stuck
> > with getting HBase to honor the "hbase.mapred.output.quorum" setting. I
> > opened a separate topic for this.
> >
> > Regards,
> >
> > /David
> >
> >
> > On Mon, Mar 18, 2013 at 11:26 PM, Jean-Daniel Cryans <
> [EMAIL PROTECTED]>wrote:
> >
> >> Checkout how CopyTable does it:
> >>
> >>
> https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/CopyTable.java
> >>
> >> J-D
> >>
> >> On Mon, Mar 18, 2013 at 3:09 PM, David Koch <[EMAIL PROTECTED]>
> wrote:
> >>> Hello,
> >>>
> >>> Is it possible to run a M/R on cluster A over a table that resides on
> >>> cluster B with output to a table on cluster A? If so, how?
> >>>
> >>> I am interested in doing this for the purpose of copying part of a
> table
> >>> from B to A. Cluster B is a production environment, cluster A is a slow
> >>> test platform. I do not want the M/R to run on B since it would block
> >>> precious slots on this cluster. Otherwise I could just run CopyTable on
> >>> cluster B and specify cluster A as output quorum.
> >>>
> >>> Could this work by pointing the client configuration at the
> >> mapred-site.xml
> >>> of cluster A and the hdfs-site.xml and hbase-site.xml of cluster B? In
> >> this
> >>> scenario - in order to output to cluster A I guess I'd have to set
> >>> TableOutputFormat.QUORUM_ADDRESS to cluster A.
> >>>
> >>> I use a client configuration generated by CDH4 and there are some other
> >>> files floating around - such as core-site.xml, not sure what to do with
> >>> that.
> >>>
> >>> Thank you,
> >>>
> >>> /David
> >>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB