Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - hbase.mapred.output.quorum ignored in Mapper job with HDFS source and HBase sink


+
David Koch 2013-03-24, 12:46
+
Ted Yu 2013-03-24, 14:35
Copy link to this message
-
Re: hbase.mapred.output.quorum ignored in Mapper job with HDFS source and HBase sink
David Koch 2013-03-26, 15:03
Hello Ted,

Yes, I'll put in a request and add a baseline example to reproduce the
issue.

Thank you for helping me get to the bottom of this,

/David
On Sun, Mar 24, 2013 at 3:35 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Looks like MultiTableOutputFormat doesn't support this use case -
> MultiTableOutputFormat doesn't extend TableOutputFormat:
>
> public class MultiTableOutputFormat
> extendsOutputFormat<ImmutableBytesWritable, Mutation> {
> Relevant configuration is setup in TableOutputFormat#setConf():
>
>   public void setConf(Configuration otherConf) {
>     this.conf = HBaseConfiguration.create(otherConf);
>     String tableName = this.conf.get(OUTPUT_TABLE);
>     if(tableName == null || tableName.length() <= 0) {
>       throw new IllegalArgumentException("Must specify table name");
>     }
>     String address = this.conf.get(QUORUM_ADDRESS);
>     int zkClientPort = conf.getInt(QUORUM_PORT, 0);
>     String serverClass = this.conf.get(REGION_SERVER_CLASS);
>     String serverImpl = this.conf.get(REGION_SERVER_IMPL);
>     try {
>       if (address != null) {
>         ZKUtil.applyClusterKeyToConf(this.conf, address);
>       }
>
> Mind filing a JIRA for enhancement ?
>
> On Sun, Mar 24, 2013 at 5:46 AM, David Koch <[EMAIL PROTECTED]> wrote:
>
> > Hello,
> >
> > I want to import a file on HDFS from one cluster A (source) into HBase
> > tables on a different cluster B (destination) using a Mapper job with an
> > HBase sink. Both clusters run HBase.
> >
> > This setup works fine:
> > - Run Mapper job on cluster B (destination)
> > - "mapred.input.dir" --> hdfs://<cluster-A>/<path-to-file> (file on
> source
> > cluster)
> > - "hbase.zookeeper.quorum" --> <quorum-hostname-B>
> > - "hbase.zookeeper.property.clientPort" --> <quorum-port-B>
> >
> > I thought it should be possible to run the job on cluster A (source) and
> > using "hbase.mapred.output.quorum" to insert into the tables on cluster
> B.
> > This is what the CopyTable utility does. However, the following does not
> > work. HBase looks for the destination table(s) on cluster A and NOT
> cluster
> > B:
> > - Run Mapper job on cluster A (source)
> > - "mapred.input.dir" --> hdfs://<cluster-A>/<path-to-file> (file is
> local)
> > - "hbase.zookeeper.quorum" --> <quorum-hostname-A>
> > - "hbase.zookeeper.property.clientPort" --> <quorum-port-A>
> > - "hbase.mapred.output.quorum" -> <quorum-hostname-B>:2181:/hbase (same
> as
> > --peer.adr argument for CopyTable)
> >
> > Job setup inside the class MyJob is as follows, note I am using
> > MultiTableOutputFormat.
> >
> > Configuration conf = HBaseConfiguration.addHbaseResources(getConf());
> > Job job = new Job(conf);
> > job.setJarByClass(MyJob.class);
> > job.setMapperClass(JsonImporterMapper.class);
> > // Note, several output tables!
> > job.setOutputFormatClass(MultiTableOutputFormat.class);
> > job.setNumReduceTasks(0);
> > TableMapReduceUtil.addDependencyJars(job);
> > TableMapReduceUtil.addDependencyJars(job.getConfiguration());
> >
> > Where The Mapper class has the following frame:
> >
> > public static class JsonImporterMapper extends
> >     Mapper<LongWritable, Text, ImmutableBytesWritable, Put> { }
> >
> > Is this expected behaviour? How can I get the second scenario using
> > hbase.mapred.output.quorum" to work? Could the fact I am using
> > MultiTableOutputFormat instead of TableOutputFormat play a part? I am
> using
> > HBase 0.92.1.
> >
> > Thank you,
> >
> > /David
> >
>
+
Ted Yu 2013-03-27, 17:40
+
David Koch 2013-03-31, 17:14
+
Ted Yu 2013-03-31, 19:38