|
|
Weishung Chung 2012-08-10, 13:10
Hi HBase users,
I need to pull data from 2 HBase tables in a mapreduce job. For 1 table input, I use TableMapReduceUtil.initTableMapperJob. Is there another method for multitable inputs ?
Thank you, Wei Shung
+
Weishung Chung 2012-08-10, 13:10
Amandeep Khurana 2012-08-10, 13:12
How do you want to use two tables? Can you explain your algo a bit?
On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <[EMAIL PROTECTED]> wrote:
> Hi HBase users, > > I need to pull data from 2 HBase tables in a mapreduce job. For 1 table > input, I use TableMapReduceUtil.initTableMapperJob. Is there another method > for multitable inputs ? > > Thank you, > Wei Shung >
+
Amandeep Khurana 2012-08-10, 13:12
Weishung Chung 2012-08-10, 13:22
Basically a join of two data sets on the same row key.
On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana <[EMAIL PROTECTED]> wrote:
> How do you want to use two tables? Can you explain your algo a bit? > > On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <[EMAIL PROTECTED]> > wrote: > > > Hi HBase users, > > > > I need to pull data from 2 HBase tables in a mapreduce job. For 1 table > > input, I use TableMapReduceUtil.initTableMapperJob. Is there another > method > > for multitable inputs ? > > > > Thank you, > > Wei Shung > > >
+
Weishung Chung 2012-08-10, 13:22
Amandeep Khurana 2012-08-10, 13:29
You can scan over one of the tables (using TableInputFormat) and do simple gets on the other table for every row that you want to join.
An interesting question to address here would be - why even need a join. Can you talk more about the data and what you are trying to do? In general you really want to denormalize and not need joins when working with HBase (or for that matter most NoSQL stores).
On Fri, Aug 10, 2012 at 6:52 PM, Weishung Chung <[EMAIL PROTECTED]> wrote:
> Basically a join of two data sets on the same row key. > > On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana <[EMAIL PROTECTED]> > wrote: > > > How do you want to use two tables? Can you explain your algo a bit? > > > > On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <[EMAIL PROTECTED]> > > wrote: > > > > > Hi HBase users, > > > > > > I need to pull data from 2 HBase tables in a mapreduce job. For 1 table > > > input, I use TableMapReduceUtil.initTableMapperJob. Is there another > > method > > > for multitable inputs ? > > > > > > Thank you, > > > Wei Shung > > > > > >
+
Amandeep Khurana 2012-08-10, 13:29
Weishung Chung 2012-08-10, 13:39
Thank you, I am trying to avoid to fetch by gets and would like to do something like hadoop MultipleInputs. Yes, it would be nice if i could denormalize and remodel the schema.
On Fri, Aug 10, 2012 at 6:29 AM, Amandeep Khurana <[EMAIL PROTECTED]> wrote:
> You can scan over one of the tables (using TableInputFormat) and do simple > gets on the other table for every row that you want to join. > > An interesting question to address here would be - why even need a join. > Can you talk more about the data and what you are trying to do? In general > you really want to denormalize and not need joins when working with HBase > (or for that matter most NoSQL stores). > > On Fri, Aug 10, 2012 at 6:52 PM, Weishung Chung <[EMAIL PROTECTED]> > wrote: > > > Basically a join of two data sets on the same row key. > > > > On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana <[EMAIL PROTECTED]> > > wrote: > > > > > How do you want to use two tables? Can you explain your algo a bit? > > > > > > On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <[EMAIL PROTECTED]> > > > wrote: > > > > > > > Hi HBase users, > > > > > > > > I need to pull data from 2 HBase tables in a mapreduce job. For 1 > table > > > > input, I use TableMapReduceUtil.initTableMapperJob. Is there another > > > method > > > > for multitable inputs ? > > > > > > > > Thank you, > > > > Wei Shung > > > > > > > > > >
+
Weishung Chung 2012-08-10, 13:39
Weishung Chung 2012-08-10, 13:41
but they are in production now
On Fri, Aug 10, 2012 at 6:39 AM, Weishung Chung <[EMAIL PROTECTED]> wrote:
> Thank you, I am trying to avoid to fetch by gets and would like to do > something like hadoop MultipleInputs. > Yes, it would be nice if i could denormalize and remodel the schema. > > > On Fri, Aug 10, 2012 at 6:29 AM, Amandeep Khurana <[EMAIL PROTECTED]>wrote: > >> You can scan over one of the tables (using TableInputFormat) and do simple >> gets on the other table for every row that you want to join. >> >> An interesting question to address here would be - why even need a join. >> Can you talk more about the data and what you are trying to do? In general >> you really want to denormalize and not need joins when working with HBase >> (or for that matter most NoSQL stores). >> >> On Fri, Aug 10, 2012 at 6:52 PM, Weishung Chung <[EMAIL PROTECTED]> >> wrote: >> >> > Basically a join of two data sets on the same row key. >> > >> > On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana <[EMAIL PROTECTED]> >> > wrote: >> > >> > > How do you want to use two tables? Can you explain your algo a bit? >> > > >> > > On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <[EMAIL PROTECTED]> >> > > wrote: >> > > >> > > > Hi HBase users, >> > > > >> > > > I need to pull data from 2 HBase tables in a mapreduce job. For 1 >> table >> > > > input, I use TableMapReduceUtil.initTableMapperJob. Is there another >> > > method >> > > > for multitable inputs ? >> > > > >> > > > Thank you, >> > > > Wei Shung >> > > > >> > > >> > >> > >
+
Weishung Chung 2012-08-10, 13:41
Bryan Beaudreault 2012-08-10, 13:50
Use 3 jobs: 1 to scan each table. The third could do a map-side join. Make sure to use the same sort and partitions on the first two.
Sent from iPhone.
On Aug 10, 2012, at 9:41 AM, Weishung Chung <[EMAIL PROTECTED]> wrote:
> but they are in production now > > On Fri, Aug 10, 2012 at 6:39 AM, Weishung Chung <[EMAIL PROTECTED]> wrote: > >> Thank you, I am trying to avoid to fetch by gets and would like to do >> something like hadoop MultipleInputs. >> Yes, it would be nice if i could denormalize and remodel the schema. >> >> >> On Fri, Aug 10, 2012 at 6:29 AM, Amandeep Khurana <[EMAIL PROTECTED]>wrote: >> >>> You can scan over one of the tables (using TableInputFormat) and do simple >>> gets on the other table for every row that you want to join. >>> >>> An interesting question to address here would be - why even need a join. >>> Can you talk more about the data and what you are trying to do? In general >>> you really want to denormalize and not need joins when working with HBase >>> (or for that matter most NoSQL stores). >>> >>> On Fri, Aug 10, 2012 at 6:52 PM, Weishung Chung <[EMAIL PROTECTED]> >>> wrote: >>> >>>> Basically a join of two data sets on the same row key. >>>> >>>> On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana <[EMAIL PROTECTED]> >>>> wrote: >>>> >>>>> How do you want to use two tables? Can you explain your algo a bit? >>>>> >>>>> On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <[EMAIL PROTECTED]> >>>>> wrote: >>>>> >>>>>> Hi HBase users, >>>>>> >>>>>> I need to pull data from 2 HBase tables in a mapreduce job. For 1 >>> table >>>>>> input, I use TableMapReduceUtil.initTableMapperJob. Is there another >>>>> method >>>>>> for multitable inputs ? >>>>>> >>>>>> Thank you, >>>>>> Wei Shung >>>>>> >>>>> >>>> >>> >> >>
+
Bryan Beaudreault 2012-08-10, 13:50
Jerry Lam 2012-08-10, 14:20
Hi Wei:
There is a jira Hbase-3996, does this sound something you are looking for?
Regards,
Jerry
On Friday, August 10, 2012, Bryan Beaudreault wrote:
> Use 3 jobs: 1 to scan each table. The third could do a map-side join. Make > sure to use the same sort and partitions on the first two. > > Sent from iPhone. > > On Aug 10, 2012, at 9:41 AM, Weishung Chung <[EMAIL PROTECTED]<javascript:;>> > wrote: > > > but they are in production now > > > > On Fri, Aug 10, 2012 at 6:39 AM, Weishung Chung <[EMAIL PROTECTED]<javascript:;>> > wrote: > > > >> Thank you, I am trying to avoid to fetch by gets and would like to do > >> something like hadoop MultipleInputs. > >> Yes, it would be nice if i could denormalize and remodel the schema. > >> > >> > >> On Fri, Aug 10, 2012 at 6:29 AM, Amandeep Khurana <[EMAIL PROTECTED]<javascript:;> > >wrote: > >> > >>> You can scan over one of the tables (using TableInputFormat) and do > simple > >>> gets on the other table for every row that you want to join. > >>> > >>> An interesting question to address here would be - why even need a > join. > >>> Can you talk more about the data and what you are trying to do? In > general > >>> you really want to denormalize and not need joins when working with > HBase > >>> (or for that matter most NoSQL stores). > >>> > >>> On Fri, Aug 10, 2012 at 6:52 PM, Weishung Chung <[EMAIL PROTECTED]<javascript:;> > > > >>> wrote: > >>> > >>>> Basically a join of two data sets on the same row key. > >>>> > >>>> On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana <[EMAIL PROTECTED]<javascript:;> > > > >>>> wrote: > >>>> > >>>>> How do you want to use two tables? Can you explain your algo a bit? > >>>>> > >>>>> On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <[EMAIL PROTECTED]<javascript:;> > > > >>>>> wrote: > >>>>> > >>>>>> Hi HBase users, > >>>>>> > >>>>>> I need to pull data from 2 HBase tables in a mapreduce job. For 1 > >>> table > >>>>>> input, I use TableMapReduceUtil.initTableMapperJob. Is there another > >>>>> method > >>>>>> for multitable inputs ? > >>>>>> > >>>>>> Thank you, > >>>>>> Wei Shung > >>>>>> > >>>>> > >>>> > >>> > >> > >> >
+
Jerry Lam 2012-08-10, 14:20
Weishung Chung 2012-08-10, 14:52
Yes...this looks like a good solution. But i am running chd3 and upgrade is scheduled not until next year.
On Fri, Aug 10, 2012 at 7:20 AM, Jerry Lam <[EMAIL PROTECTED]> wrote:
> Hi Wei: > > There is a jira Hbase-3996, does this sound something you are looking for? > > Regards, > > Jerry > > On Friday, August 10, 2012, Bryan Beaudreault wrote: > > > Use 3 jobs: 1 to scan each table. The third could do a map-side join. > Make > > sure to use the same sort and partitions on the first two. > > > > Sent from iPhone. > > > > On Aug 10, 2012, at 9:41 AM, Weishung Chung <[EMAIL PROTECTED] > <javascript:;>> > > wrote: > > > > > but they are in production now > > > > > > On Fri, Aug 10, 2012 at 6:39 AM, Weishung Chung <[EMAIL PROTECTED] > <javascript:;>> > > wrote: > > > > > >> Thank you, I am trying to avoid to fetch by gets and would like to do > > >> something like hadoop MultipleInputs. > > >> Yes, it would be nice if i could denormalize and remodel the schema. > > >> > > >> > > >> On Fri, Aug 10, 2012 at 6:29 AM, Amandeep Khurana <[EMAIL PROTECTED] > <javascript:;> > > >wrote: > > >> > > >>> You can scan over one of the tables (using TableInputFormat) and do > > simple > > >>> gets on the other table for every row that you want to join. > > >>> > > >>> An interesting question to address here would be - why even need a > > join. > > >>> Can you talk more about the data and what you are trying to do? In > > general > > >>> you really want to denormalize and not need joins when working with > > HBase > > >>> (or for that matter most NoSQL stores). > > >>> > > >>> On Fri, Aug 10, 2012 at 6:52 PM, Weishung Chung <[EMAIL PROTECTED] > <javascript:;> > > > > > >>> wrote: > > >>> > > >>>> Basically a join of two data sets on the same row key. > > >>>> > > >>>> On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana <[EMAIL PROTECTED] > <javascript:;> > > > > > >>>> wrote: > > >>>> > > >>>>> How do you want to use two tables? Can you explain your algo a bit? > > >>>>> > > >>>>> On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung < > [EMAIL PROTECTED]<javascript:;> > > > > > >>>>> wrote: > > >>>>> > > >>>>>> Hi HBase users, > > >>>>>> > > >>>>>> I need to pull data from 2 HBase tables in a mapreduce job. For 1 > > >>> table > > >>>>>> input, I use TableMapReduceUtil.initTableMapperJob. Is there > another > > >>>>> method > > >>>>>> for multitable inputs ? > > >>>>>> > > >>>>>> Thank you, > > >>>>>> Wei Shung > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > >> > > >
+
Weishung Chung 2012-08-10, 14:52
|
|