Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> MapReduce job with mixed data sources: HBase table and HDFS files


Copy link to this message
-
Re: MapReduce job with mixed data sources: HBase table and HDFS files
Hi,
1) It cannot input two different cluster's data to a MR job.
2) If your data locates in the same cluster, then:

    conf.set(TableInputFormat.SCAN,
TableMapReduceUtil.convertScanToString(new Scan()));
    conf.set(TableInputFormat.INPUT_TABLE, tableName);

    MultipleInputs.addInputPath(conf, new Path(input_on_hdfs),
TextInputFormat.class, MapperForHdfs.class);
    MultipleInputs.addInputPath(conf, new Path(input_on_hbase),
TableInputFormat.class, MapperForHBase.class);*

*
but,
new Path(input_on_hbase) can be any path, it make no sense.*

*
Please refer to
org.apache.hadoop.hbase.mapreduce.IndexBuilder for how to read table in the
MR job under $HBASE_HOME/src/example*

*
On Thu, Jul 4, 2013 at 5:19 AM, Michael Segel <[EMAIL PROTECTED]>wrote:

> You may want to pull your data from your HBase first in a separate map
> only job and then use its output along with other HDFS input.
> There is a significant disparity between the reads from HDFS and from
> HBase.
>
>
> On Jul 3, 2013, at 10:34 AM, S. Zhou <[EMAIL PROTECTED]> wrote:
>
> > Azuryy, I am looking at the MultipleInputs doc. But I could not figure
> out how to add HBase table as a Path to the input? Do you have some sample
> code? Thanks!
> >
> >
> >
> >
> > ________________________________
> > From: Azuryy Yu <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]; S. Zhou <[EMAIL PROTECTED]>
> > Sent: Tuesday, July 2, 2013 10:06 PM
> > Subject: Re: MapReduce job with mixed data sources: HBase table and HDFS
> files
> >
> >
> > Hi ,
> >
> > Use MultipleInputs, which can solve your problem.
> >
> >
> > On Wed, Jul 3, 2013 at 12:34 PM, S. Zhou <[EMAIL PROTECTED]> wrote:
> >
> >> Hi there,
> >>
> >> I know how to create MapReduce job with HBase data source only or HDFS
> >> file as data source. Now I need to create a MapReduce job with mixed
> data
> >> sources, that is, this MR job need to read data from both HBase and HDFS
> >> files. Is it possible? If yes, could u share some sample code?
> >>
> >> Thanks!
> >> Senqiang
>
>