Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> MapReduce job with mixed data sources: HBase table and HDFS files


Copy link to this message
-
Re: MapReduce job with mixed data sources: HBase table and HDFS files
Hi,
1) It cannot input two different cluster's data to a MR job.
2) If your data locates in the same cluster, then:

    conf.set(TableInputFormat.SCAN,
TableMapReduceUtil.convertScanToString(new Scan()));
    conf.set(TableInputFormat.INPUT_TABLE, tableName);

    MultipleInputs.addInputPath(conf, new Path(input_on_hdfs),
TextInputFormat.class, MapperForHdfs.class);
    MultipleInputs.addInputPath(conf, new Path(input_on_hbase),
TableInputFormat.class, MapperForHBase.class);*

*
but,
new Path(input_on_hbase) can be any path, it make no sense.*

*
Please refer to
org.apache.hadoop.hbase.mapreduce.IndexBuilder for how to read table in the
MR job under $HBASE_HOME/src/example*

*
On Thu, Jul 4, 2013 at 5:19 AM, Michael Segel <[EMAIL PROTECTED]>wrote:

> You may want to pull your data from your HBase first in a separate map
> only job and then use its output along with other HDFS input.
> There is a significant disparity between the reads from HDFS and from
> HBase.
>
>
> On Jul 3, 2013, at 10:34 AM, S. Zhou <[EMAIL PROTECTED]> wrote:
>
> > Azuryy, I am looking at the MultipleInputs doc. But I could not figure
> out how to add HBase table as a Path to the input? Do you have some sample
> code? Thanks!
> >
> >
> >
> >
> > ________________________________
> > From: Azuryy Yu <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]; S. Zhou <[EMAIL PROTECTED]>
> > Sent: Tuesday, July 2, 2013 10:06 PM
> > Subject: Re: MapReduce job with mixed data sources: HBase table and HDFS
> files
> >
> >
> > Hi ,
> >
> > Use MultipleInputs, which can solve your problem.
> >
> >
> > On Wed, Jul 3, 2013 at 12:34 PM, S. Zhou <[EMAIL PROTECTED]> wrote:
> >
> >> Hi there,
> >>
> >> I know how to create MapReduce job with HBase data source only or HDFS
> >> file as data source. Now I need to create a MapReduce job with mixed
> data
> >> sources, that is, this MR job need to read data from both HBase and HDFS
> >> files. Is it possible? If yes, could u share some sample code?
> >>
> >> Thanks!
> >> Senqiang
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB