Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> MapReduce job with mixed data sources: HBase table and HDFS files


Copy link to this message
-
Re: MapReduce job with mixed data sources: HBase table and HDFS files
Can you utilize initTableMapperJob() (which
calls TableMapReduceUtil.convertScanToString() underneath) ?

On Wed, Jul 10, 2013 at 10:15 AM, S. Zhou <[EMAIL PROTECTED]> wrote:

> Hi Azuryy, I am testing the way you suggested. Now I am facing a
> compilation error for the following statement:
> conf.set(TableInputFormat.SCAN, TableMapReduceUtil.convertScanToString(new
> Scan()));
>
>
> The error is: "method convertScanToString is not visible in
> TableMapReduceUtil". Could u help? It blocks me.
>
>
> BTW, I am using the HBase-server jar file version 0.95.1-hadoop1 . I tried
> other versions as well like 0.94.9 and got the same error.
>
> Thanks!
>
>
> ________________________________
>  From: Azuryy Yu <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Wednesday, July 3, 2013 6:02 PM
> Subject: Re: MapReduce job with mixed data sources: HBase table and HDFS
> files
>
>
> Hi,
> 1) It cannot input two different cluster's data to a MR job.
> 2) If your data locates in the same cluster, then:
>
>     conf.set(TableInputFormat.SCAN,
> TableMapReduceUtil.convertScanToString(new Scan()));
>     conf.set(TableInputFormat.INPUT_TABLE, tableName);
>
>     MultipleInputs.addInputPath(conf, new Path(input_on_hdfs),
> TextInputFormat.class, MapperForHdfs.class);
>     MultipleInputs.addInputPath(conf, new Path(input_on_hbase),
> TableInputFormat.class, MapperForHBase.class);*
>
> *
> but,
> new Path(input_on_hbase) can be any path, it make no sense.*
>
> *
> Please refer to
> org.apache.hadoop.hbase.mapreduce.IndexBuilder for how to read table in the
> MR job under $HBASE_HOME/src/example*
>
>
>
> *
>
>
> On Thu, Jul 4, 2013 at 5:19 AM, Michael Segel <[EMAIL PROTECTED]
> >wrote:
>
> > You may want to pull your data from your HBase first in a separate map
> > only job and then use its output along with other HDFS input.
> > There is a significant disparity between the reads from HDFS and from
> > HBase.
> >
> >
> > On Jul 3, 2013, at 10:34 AM, S. Zhou <[EMAIL PROTECTED]> wrote:
> >
> > > Azuryy, I am looking at the MultipleInputs doc. But I could not figure
> > out how to add HBase table as a Path to the input? Do you have some
> sample
> > code? Thanks!
> > >
> > >
> > >
> > >
> > > ________________________________
> > > From: Azuryy Yu <[EMAIL PROTECTED]>
> > > To: [EMAIL PROTECTED]; S. Zhou <[EMAIL PROTECTED]>
> > > Sent: Tuesday, July 2, 2013 10:06 PM
> > > Subject: Re: MapReduce job with mixed data sources: HBase table and
> HDFS
> > files
> > >
> > >
> > > Hi ,
> > >
> > > Use MultipleInputs, which can solve your problem.
> > >
> > >
> > > On Wed, Jul 3, 2013 at 12:34 PM, S. Zhou <[EMAIL PROTECTED]> wrote:
> > >
> > >> Hi there,
> > >>
> > >> I know how to create MapReduce job with HBase data source only or HDFS
> > >> file as data source. Now I need to create a MapReduce job with mixed
> > data
> > >> sources, that is, this MR job need to read data from both HBase and
> HDFS
> > >> files. Is it possible? If yes, could u share some sample code?
> > >>
> > >> Thanks!
> > >> Senqiang
> >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB