Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> MapReduce job with mixed data sources: HBase table and HDFS files


Copy link to this message
-
Re: MapReduce job with mixed data sources: HBase table and HDFS files
bq. disparity between the reads from HDFS and from HBase

Depending on consistency requirement, the following JIRA should reduce the
disparity (if reading slightly out of date data from HBase is acceptable):

HBASE-8369 MapReduce over snapshot files

Cheers
> On Thu, Jul 4, 2013 at 5:19 AM, Michael Segel <[EMAIL PROTECTED]
> >wrote:
>
> > You may want to pull your data from your HBase first in a separate map
> > only job and then use its output along with other HDFS input.
> > There is a significant disparity between the reads from HDFS and from
> > HBase.
> >
> >
> > On Jul 3, 2013, at 10:34 AM, S. Zhou <[EMAIL PROTECTED]> wrote:
> >
> > > Azuryy, I am looking at the MultipleInputs doc. But I could not figure
> > out how to add HBase table as a Path to the input? Do you have some
> sample
> > code? Thanks!
> > >
> > >
> > >
> > >
> > > ________________________________
> > > From: Azuryy Yu <[EMAIL PROTECTED]>
> > > To: [EMAIL PROTECTED]; S. Zhou <[EMAIL PROTECTED]>
> > > Sent: Tuesday, July 2, 2013 10:06 PM
> > > Subject: Re: MapReduce job with mixed data sources: HBase table and
> HDFS
> > files
> > >
> > >
> > > Hi ,
> > >
> > > Use MultipleInputs, which can solve your problem.
> > >
> > >
> > > On Wed, Jul 3, 2013 at 12:34 PM, S. Zhou <[EMAIL PROTECTED]> wrote:
> > >
> > >> Hi there,
> > >>
> > >> I know how to create MapReduce job with HBase data source only or HDFS
> > >> file as data source. Now I need to create a MapReduce job with mixed
> > data
> > >> sources, that is, this MR job need to read data from both HBase and
> HDFS
> > >> files. Is it possible? If yes, could u share some sample code?
> > >>
> > >> Thanks!
> > >> Senqiang
> >
> >
>