Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> hbase.mapred.output.quorum ignored in Mapper job with HDFS source and HBase sink


Copy link to this message
-
Re: hbase.mapred.output.quorum ignored in Mapper job with HDFS source and HBase sink
Looks like MultiTableOutputFormat doesn't support this use case -
MultiTableOutputFormat doesn't extend TableOutputFormat:

public class MultiTableOutputFormat
extendsOutputFormat<ImmutableBytesWritable, Mutation> {
Relevant configuration is setup in TableOutputFormat#setConf():

  public void setConf(Configuration otherConf) {
    this.conf = HBaseConfiguration.create(otherConf);
    String tableName = this.conf.get(OUTPUT_TABLE);
    if(tableName == null || tableName.length() <= 0) {
      throw new IllegalArgumentException("Must specify table name");
    }
    String address = this.conf.get(QUORUM_ADDRESS);
    int zkClientPort = conf.getInt(QUORUM_PORT, 0);
    String serverClass = this.conf.get(REGION_SERVER_CLASS);
    String serverImpl = this.conf.get(REGION_SERVER_IMPL);
    try {
      if (address != null) {
        ZKUtil.applyClusterKeyToConf(this.conf, address);
      }

Mind filing a JIRA for enhancement ?

On Sun, Mar 24, 2013 at 5:46 AM, David Koch <[EMAIL PROTECTED]> wrote:

> Hello,
>
> I want to import a file on HDFS from one cluster A (source) into HBase
> tables on a different cluster B (destination) using a Mapper job with an
> HBase sink. Both clusters run HBase.
>
> This setup works fine:
> - Run Mapper job on cluster B (destination)
> - "mapred.input.dir" --> hdfs://<cluster-A>/<path-to-file> (file on source
> cluster)
> - "hbase.zookeeper.quorum" --> <quorum-hostname-B>
> - "hbase.zookeeper.property.clientPort" --> <quorum-port-B>
>
> I thought it should be possible to run the job on cluster A (source) and
> using "hbase.mapred.output.quorum" to insert into the tables on cluster B.
> This is what the CopyTable utility does. However, the following does not
> work. HBase looks for the destination table(s) on cluster A and NOT cluster
> B:
> - Run Mapper job on cluster A (source)
> - "mapred.input.dir" --> hdfs://<cluster-A>/<path-to-file> (file is local)
> - "hbase.zookeeper.quorum" --> <quorum-hostname-A>
> - "hbase.zookeeper.property.clientPort" --> <quorum-port-A>
> - "hbase.mapred.output.quorum" -> <quorum-hostname-B>:2181:/hbase (same as
> --peer.adr argument for CopyTable)
>
> Job setup inside the class MyJob is as follows, note I am using
> MultiTableOutputFormat.
>
> Configuration conf = HBaseConfiguration.addHbaseResources(getConf());
> Job job = new Job(conf);
> job.setJarByClass(MyJob.class);
> job.setMapperClass(JsonImporterMapper.class);
> // Note, several output tables!
> job.setOutputFormatClass(MultiTableOutputFormat.class);
> job.setNumReduceTasks(0);
> TableMapReduceUtil.addDependencyJars(job);
> TableMapReduceUtil.addDependencyJars(job.getConfiguration());
>
> Where The Mapper class has the following frame:
>
> public static class JsonImporterMapper extends
>     Mapper<LongWritable, Text, ImmutableBytesWritable, Put> { }
>
> Is this expected behaviour? How can I get the second scenario using
> hbase.mapred.output.quorum" to work? Could the fact I am using
> MultiTableOutputFormat instead of TableOutputFormat play a part? I am using
> HBase 0.92.1.
>
> Thank you,
>
> /David
>