Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> MapReduce job with mixed data sources: HBase table and HDFS files


Copy link to this message
-
Re: MapReduce job with mixed data sources: HBase table and HDFS files
Can you utilize initTableMapperJob() (which
calls TableMapReduceUtil.convertScanToString() underneath) ?

On Wed, Jul 10, 2013 at 10:15 AM, S. Zhou <[EMAIL PROTECTED]> wrote:

> Hi Azuryy, I am testing the way you suggested. Now I am facing a
> compilation error for the following statement:
> conf.set(TableInputFormat.SCAN, TableMapReduceUtil.convertScanToString(new
> Scan()));
>
>
> The error is: "method convertScanToString is not visible in
> TableMapReduceUtil". Could u help? It blocks me.
>
>
> BTW, I am using the HBase-server jar file version 0.95.1-hadoop1 . I tried
> other versions as well like 0.94.9 and got the same error.
>
> Thanks!
>
>
> ________________________________
>  From: Azuryy Yu <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Wednesday, July 3, 2013 6:02 PM
> Subject: Re: MapReduce job with mixed data sources: HBase table and HDFS
> files
>
>
> Hi,
> 1) It cannot input two different cluster's data to a MR job.
> 2) If your data locates in the same cluster, then:
>
>     conf.set(TableInputFormat.SCAN,
> TableMapReduceUtil.convertScanToString(new Scan()));
>     conf.set(TableInputFormat.INPUT_TABLE, tableName);
>
>     MultipleInputs.addInputPath(conf, new Path(input_on_hdfs),
> TextInputFormat.class, MapperForHdfs.class);
>     MultipleInputs.addInputPath(conf, new Path(input_on_hbase),
> TableInputFormat.class, MapperForHBase.class);*
>
> *
> but,
> new Path(input_on_hbase) can be any path, it make no sense.*
>
> *
> Please refer to
> org.apache.hadoop.hbase.mapreduce.IndexBuilder for how to read table in the
> MR job under $HBASE_HOME/src/example*
>
>
>
> *
>
>
> On Thu, Jul 4, 2013 at 5:19 AM, Michael Segel <[EMAIL PROTECTED]
> >wrote:
>
> > You may want to pull your data from your HBase first in a separate map
> > only job and then use its output along with other HDFS input.
> > There is a significant disparity between the reads from HDFS and from
> > HBase.
> >
> >
> > On Jul 3, 2013, at 10:34 AM, S. Zhou <[EMAIL PROTECTED]> wrote:
> >
> > > Azuryy, I am looking at the MultipleInputs doc. But I could not figure
> > out how to add HBase table as a Path to the input? Do you have some
> sample
> > code? Thanks!
> > >
> > >
> > >
> > >
> > > ________________________________
> > > From: Azuryy Yu <[EMAIL PROTECTED]>
> > > To: [EMAIL PROTECTED]; S. Zhou <[EMAIL PROTECTED]>
> > > Sent: Tuesday, July 2, 2013 10:06 PM
> > > Subject: Re: MapReduce job with mixed data sources: HBase table and
> HDFS
> > files
> > >
> > >
> > > Hi ,
> > >
> > > Use MultipleInputs, which can solve your problem.
> > >
> > >
> > > On Wed, Jul 3, 2013 at 12:34 PM, S. Zhou <[EMAIL PROTECTED]> wrote:
> > >
> > >> Hi there,
> > >>
> > >> I know how to create MapReduce job with HBase data source only or HDFS
> > >> file as data source. Now I need to create a MapReduce job with mixed
> > data
> > >> sources, that is, this MR job need to read data from both HBase and
> HDFS
> > >> files. Is it possible? If yes, could u share some sample code?
> > >>
> > >> Thanks!
> > >> Senqiang
> >
> >
>