Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Custom TableInputFormat not working correctly


Copy link to this message
-
Re: Custom TableInputFormat not working correctly
edward choi 2011-06-21, 00:10
Hello, St.Ack

I found the reason at last on 3:00 am last night.
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob() is
hardcoded to select "TableInputFormat.class" as InputFormatClass.
I think it's because the information of the source HBase (such as tableName,
Families added...) needs to be delivered to the Map task. And it is done by
passing arguments to Job.
So I created my own initTableMapperJob() instead.
I think there should be a generic method that can hold any custom
TableInputFormats users created.

Ed.

2011/6/21 Stack <[EMAIL PROTECTED]>

> Do you have > 100k rows?
> St.Ack
>
> On Sun, Jun 19, 2011 at 8:49 AM, edward choi <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > I have implemented a custom TableInputFormat.
> > I call it TableInputFormatMapPerRow and that is exactly what it does.
> > The getSplits() of my custom TableInputFormat creates a TableSplit for
> each
> > row in the HBase.
> > But when I actually run an application with my custom TableInputFormat,
> > there are not enough map tasks than there should be.
> > I really don't know what I am doing wrong.
> > Any suggestions please?
> > Below is my TableInputFormatMapPerRow.java
> >
> > Ed
> >
> >
> ----------------------------------------------------------------------------------------------
> >
> > /**
> >  * Copyright 2007 The Apache Software Foundation
> >  *
> >  * Licensed to the Apache Software Foundation (ASF) under one
> >  * or more contributor license agreements.  See the NOTICE file
> >  * distributed with this work for additional information
> >  * regarding copyright ownership.  The ASF licenses this file
> >  * to you under the Apache License, Version 2.0 (the
> >  * "License"); you may not use this file except in compliance
> >  * with the License.  You may obtain a copy of the License at
> >  *
> >  *     http://www.apache.org/licenses/LICENSE-2.0
> >  *
> >  * Unless required by applicable law or agreed to in writing, software
> >  * distributed under the License is distributed on an "AS IS" BASIS,
> >  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> >  * See the License for the specific language governing permissions and
> >  * limitations under the License.
> >  */
> > package org.apache.hadoop.hbase.mapreduce;
> >
> > import java.io.IOException;
> > import java.util.ArrayList;
> > import java.util.List;
> >
> > import org.apache.commons.logging.Log;
> > import org.apache.commons.logging.LogFactory;
> >
> > import org.apache.hadoop.conf.Configurable;
> > import org.apache.hadoop.conf.Configuration;
> > import org.apache.hadoop.mapreduce.JobContext;
> > import org.apache.hadoop.mapreduce.InputFormat;
> > import org.apache.hadoop.mapreduce.InputSplit;
> > import org.apache.hadoop.mapreduce.RecordReader;
> > import org.apache.hadoop.mapreduce.TaskAttemptContext;
> > import org.apache.hadoop.util.StringUtils;
> >
> > import org.apache.hadoop.hbase.client.HTable;
> > import org.apache.hadoop.hbase.client.Result;
> > import org.apache.hadoop.hbase.client.ResultScanner;
> > import org.apache.hadoop.hbase.client.Scan;
> > import org.apache.hadoop.hbase.filter.KeyOnlyFilter;
> > import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
> > import org.apache.hadoop.hbase.mapreduce.TableInputFormatBase;
> > import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
> > import org.apache.hadoop.hbase.util.Bytes;
> >
> > /**
> >  * Convert HBase tabular data into a format that is consumable by
> > Map/Reduce.
> >  */
> > public class TableInputFormatMapPerRow extends
> > InputFormat<ImmutableBytesWritable, Result>
> > implements Configurable {
> >
> >  private final Log LOG > > LogFactory.getLog(TableInputFormatMapPerRow.class);
> >
> >  /** Job parameter that specifies the input table. */
> >  public static final String INPUT_TABLE = "hbase.mapreduce.inputtable";
> >  /** Base-64 encoded scanner. All other SCAN_ confs are ignored if this
> is
> > specified.
> >   * See {@link TableMapReduceUtil#convertScanToString(Scan)} for more