Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> hbase.mapred.output.quorum ignored in Mapper job with HDFS source and HBase sink


Copy link to this message
-
Re: hbase.mapred.output.quorum ignored in Mapper job with HDFS source and HBase sink
Looks like MultiTableOutputFormat doesn't support this use case -
MultiTableOutputFormat doesn't extend TableOutputFormat:

public class MultiTableOutputFormat
extendsOutputFormat<ImmutableBytesWritable, Mutation> {
Relevant configuration is setup in TableOutputFormat#setConf():

  public void setConf(Configuration otherConf) {
    this.conf = HBaseConfiguration.create(otherConf);
    String tableName = this.conf.get(OUTPUT_TABLE);
    if(tableName == null || tableName.length() <= 0) {
      throw new IllegalArgumentException("Must specify table name");
    }
    String address = this.conf.get(QUORUM_ADDRESS);
    int zkClientPort = conf.getInt(QUORUM_PORT, 0);
    String serverClass = this.conf.get(REGION_SERVER_CLASS);
    String serverImpl = this.conf.get(REGION_SERVER_IMPL);
    try {
      if (address != null) {
        ZKUtil.applyClusterKeyToConf(this.conf, address);
      }

Mind filing a JIRA for enhancement ?

On Sun, Mar 24, 2013 at 5:46 AM, David Koch <[EMAIL PROTECTED]> wrote:

> Hello,
>
> I want to import a file on HDFS from one cluster A (source) into HBase
> tables on a different cluster B (destination) using a Mapper job with an
> HBase sink. Both clusters run HBase.
>
> This setup works fine:
> - Run Mapper job on cluster B (destination)
> - "mapred.input.dir" --> hdfs://<cluster-A>/<path-to-file> (file on source
> cluster)
> - "hbase.zookeeper.quorum" --> <quorum-hostname-B>
> - "hbase.zookeeper.property.clientPort" --> <quorum-port-B>
>
> I thought it should be possible to run the job on cluster A (source) and
> using "hbase.mapred.output.quorum" to insert into the tables on cluster B.
> This is what the CopyTable utility does. However, the following does not
> work. HBase looks for the destination table(s) on cluster A and NOT cluster
> B:
> - Run Mapper job on cluster A (source)
> - "mapred.input.dir" --> hdfs://<cluster-A>/<path-to-file> (file is local)
> - "hbase.zookeeper.quorum" --> <quorum-hostname-A>
> - "hbase.zookeeper.property.clientPort" --> <quorum-port-A>
> - "hbase.mapred.output.quorum" -> <quorum-hostname-B>:2181:/hbase (same as
> --peer.adr argument for CopyTable)
>
> Job setup inside the class MyJob is as follows, note I am using
> MultiTableOutputFormat.
>
> Configuration conf = HBaseConfiguration.addHbaseResources(getConf());
> Job job = new Job(conf);
> job.setJarByClass(MyJob.class);
> job.setMapperClass(JsonImporterMapper.class);
> // Note, several output tables!
> job.setOutputFormatClass(MultiTableOutputFormat.class);
> job.setNumReduceTasks(0);
> TableMapReduceUtil.addDependencyJars(job);
> TableMapReduceUtil.addDependencyJars(job.getConfiguration());
>
> Where The Mapper class has the following frame:
>
> public static class JsonImporterMapper extends
>     Mapper<LongWritable, Text, ImmutableBytesWritable, Put> { }
>
> Is this expected behaviour? How can I get the second scenario using
> hbase.mapred.output.quorum" to work? Could the fact I am using
> MultiTableOutputFormat instead of TableOutputFormat play a part? I am using
> HBase 0.92.1.
>
> Thank you,
>
> /David
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB