|
Weishung Chung
2012-08-10, 13:10
Amandeep Khurana
2012-08-10, 13:12
Weishung Chung
2012-08-10, 13:22
Amandeep Khurana
2012-08-10, 13:29
Weishung Chung
2012-08-10, 13:39
Weishung Chung
2012-08-10, 13:41
Bryan Beaudreault
2012-08-10, 13:50
Jerry Lam
2012-08-10, 14:20
Weishung Chung
2012-08-10, 14:52
|
-
multitable queryWeishung Chung 2012-08-10, 13:10
Hi HBase users,
I need to pull data from 2 HBase tables in a mapreduce job. For 1 table input, I use TableMapReduceUtil.initTableMapperJob. Is there another method for multitable inputs ? Thank you, Wei Shung
-
Re: multitable queryAmandeep Khurana 2012-08-10, 13:12
How do you want to use two tables? Can you explain your algo a bit?
On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <[EMAIL PROTECTED]> wrote: > Hi HBase users, > > I need to pull data from 2 HBase tables in a mapreduce job. For 1 table > input, I use TableMapReduceUtil.initTableMapperJob. Is there another method > for multitable inputs ? > > Thank you, > Wei Shung >
-
Re: multitable queryWeishung Chung 2012-08-10, 13:22
Basically a join of two data sets on the same row key.
On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana <[EMAIL PROTECTED]> wrote: > How do you want to use two tables? Can you explain your algo a bit? > > On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <[EMAIL PROTECTED]> > wrote: > > > Hi HBase users, > > > > I need to pull data from 2 HBase tables in a mapreduce job. For 1 table > > input, I use TableMapReduceUtil.initTableMapperJob. Is there another > method > > for multitable inputs ? > > > > Thank you, > > Wei Shung > > >
-
Re: multitable queryAmandeep Khurana 2012-08-10, 13:29
You can scan over one of the tables (using TableInputFormat) and do simple
gets on the other table for every row that you want to join. An interesting question to address here would be - why even need a join. Can you talk more about the data and what you are trying to do? In general you really want to denormalize and not need joins when working with HBase (or for that matter most NoSQL stores). On Fri, Aug 10, 2012 at 6:52 PM, Weishung Chung <[EMAIL PROTECTED]> wrote: > Basically a join of two data sets on the same row key. > > On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana <[EMAIL PROTECTED]> > wrote: > > > How do you want to use two tables? Can you explain your algo a bit? > > > > On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <[EMAIL PROTECTED]> > > wrote: > > > > > Hi HBase users, > > > > > > I need to pull data from 2 HBase tables in a mapreduce job. For 1 table > > > input, I use TableMapReduceUtil.initTableMapperJob. Is there another > > method > > > for multitable inputs ? > > > > > > Thank you, > > > Wei Shung > > > > > >
-
Re: multitable queryWeishung Chung 2012-08-10, 13:39
Thank you, I am trying to avoid to fetch by gets and would like to do
something like hadoop MultipleInputs. Yes, it would be nice if i could denormalize and remodel the schema. On Fri, Aug 10, 2012 at 6:29 AM, Amandeep Khurana <[EMAIL PROTECTED]> wrote: > You can scan over one of the tables (using TableInputFormat) and do simple > gets on the other table for every row that you want to join. > > An interesting question to address here would be - why even need a join. > Can you talk more about the data and what you are trying to do? In general > you really want to denormalize and not need joins when working with HBase > (or for that matter most NoSQL stores). > > On Fri, Aug 10, 2012 at 6:52 PM, Weishung Chung <[EMAIL PROTECTED]> > wrote: > > > Basically a join of two data sets on the same row key. > > > > On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana <[EMAIL PROTECTED]> > > wrote: > > > > > How do you want to use two tables? Can you explain your algo a bit? > > > > > > On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <[EMAIL PROTECTED]> > > > wrote: > > > > > > > Hi HBase users, > > > > > > > > I need to pull data from 2 HBase tables in a mapreduce job. For 1 > table > > > > input, I use TableMapReduceUtil.initTableMapperJob. Is there another > > > method > > > > for multitable inputs ? > > > > > > > > Thank you, > > > > Wei Shung > > > > > > > > > >
-
Re: multitable queryWeishung Chung 2012-08-10, 13:41
but they are in production now
On Fri, Aug 10, 2012 at 6:39 AM, Weishung Chung <[EMAIL PROTECTED]> wrote: > Thank you, I am trying to avoid to fetch by gets and would like to do > something like hadoop MultipleInputs. > Yes, it would be nice if i could denormalize and remodel the schema. > > > On Fri, Aug 10, 2012 at 6:29 AM, Amandeep Khurana <[EMAIL PROTECTED]>wrote: > >> You can scan over one of the tables (using TableInputFormat) and do simple >> gets on the other table for every row that you want to join. >> >> An interesting question to address here would be - why even need a join. >> Can you talk more about the data and what you are trying to do? In general >> you really want to denormalize and not need joins when working with HBase >> (or for that matter most NoSQL stores). >> >> On Fri, Aug 10, 2012 at 6:52 PM, Weishung Chung <[EMAIL PROTECTED]> >> wrote: >> >> > Basically a join of two data sets on the same row key. >> > >> > On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana <[EMAIL PROTECTED]> >> > wrote: >> > >> > > How do you want to use two tables? Can you explain your algo a bit? >> > > >> > > On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <[EMAIL PROTECTED]> >> > > wrote: >> > > >> > > > Hi HBase users, >> > > > >> > > > I need to pull data from 2 HBase tables in a mapreduce job. For 1 >> table >> > > > input, I use TableMapReduceUtil.initTableMapperJob. Is there another >> > > method >> > > > for multitable inputs ? >> > > > >> > > > Thank you, >> > > > Wei Shung >> > > > >> > > >> > >> > >
-
Re: multitable queryBryan Beaudreault 2012-08-10, 13:50
Use 3 jobs: 1 to scan each table. The third could do a map-side join. Make sure to use the same sort and partitions on the first two.
Sent from iPhone. On Aug 10, 2012, at 9:41 AM, Weishung Chung <[EMAIL PROTECTED]> wrote: > but they are in production now > > On Fri, Aug 10, 2012 at 6:39 AM, Weishung Chung <[EMAIL PROTECTED]> wrote: > >> Thank you, I am trying to avoid to fetch by gets and would like to do >> something like hadoop MultipleInputs. >> Yes, it would be nice if i could denormalize and remodel the schema. >> >> >> On Fri, Aug 10, 2012 at 6:29 AM, Amandeep Khurana <[EMAIL PROTECTED]>wrote: >> >>> You can scan over one of the tables (using TableInputFormat) and do simple >>> gets on the other table for every row that you want to join. >>> >>> An interesting question to address here would be - why even need a join. >>> Can you talk more about the data and what you are trying to do? In general >>> you really want to denormalize and not need joins when working with HBase >>> (or for that matter most NoSQL stores). >>> >>> On Fri, Aug 10, 2012 at 6:52 PM, Weishung Chung <[EMAIL PROTECTED]> >>> wrote: >>> >>>> Basically a join of two data sets on the same row key. >>>> >>>> On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana <[EMAIL PROTECTED]> >>>> wrote: >>>> >>>>> How do you want to use two tables? Can you explain your algo a bit? >>>>> >>>>> On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <[EMAIL PROTECTED]> >>>>> wrote: >>>>> >>>>>> Hi HBase users, >>>>>> >>>>>> I need to pull data from 2 HBase tables in a mapreduce job. For 1 >>> table >>>>>> input, I use TableMapReduceUtil.initTableMapperJob. Is there another >>>>> method >>>>>> for multitable inputs ? >>>>>> >>>>>> Thank you, >>>>>> Wei Shung >>>>>> >>>>> >>>> >>> >> >>
-
Re: multitable queryJerry Lam 2012-08-10, 14:20
Hi Wei:
There is a jira Hbase-3996, does this sound something you are looking for? Regards, Jerry On Friday, August 10, 2012, Bryan Beaudreault wrote: > Use 3 jobs: 1 to scan each table. The third could do a map-side join. Make > sure to use the same sort and partitions on the first two. > > Sent from iPhone. > > On Aug 10, 2012, at 9:41 AM, Weishung Chung <[EMAIL PROTECTED]<javascript:;>> > wrote: > > > but they are in production now > > > > On Fri, Aug 10, 2012 at 6:39 AM, Weishung Chung <[EMAIL PROTECTED]<javascript:;>> > wrote: > > > >> Thank you, I am trying to avoid to fetch by gets and would like to do > >> something like hadoop MultipleInputs. > >> Yes, it would be nice if i could denormalize and remodel the schema. > >> > >> > >> On Fri, Aug 10, 2012 at 6:29 AM, Amandeep Khurana <[EMAIL PROTECTED]<javascript:;> > >wrote: > >> > >>> You can scan over one of the tables (using TableInputFormat) and do > simple > >>> gets on the other table for every row that you want to join. > >>> > >>> An interesting question to address here would be - why even need a > join. > >>> Can you talk more about the data and what you are trying to do? In > general > >>> you really want to denormalize and not need joins when working with > HBase > >>> (or for that matter most NoSQL stores). > >>> > >>> On Fri, Aug 10, 2012 at 6:52 PM, Weishung Chung <[EMAIL PROTECTED]<javascript:;> > > > >>> wrote: > >>> > >>>> Basically a join of two data sets on the same row key. > >>>> > >>>> On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana <[EMAIL PROTECTED]<javascript:;> > > > >>>> wrote: > >>>> > >>>>> How do you want to use two tables? Can you explain your algo a bit? > >>>>> > >>>>> On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <[EMAIL PROTECTED]<javascript:;> > > > >>>>> wrote: > >>>>> > >>>>>> Hi HBase users, > >>>>>> > >>>>>> I need to pull data from 2 HBase tables in a mapreduce job. For 1 > >>> table > >>>>>> input, I use TableMapReduceUtil.initTableMapperJob. Is there another > >>>>> method > >>>>>> for multitable inputs ? > >>>>>> > >>>>>> Thank you, > >>>>>> Wei Shung > >>>>>> > >>>>> > >>>> > >>> > >> > >> >
-
Re: multitable queryWeishung Chung 2012-08-10, 14:52
Yes...this looks like a good solution. But i am running chd3 and upgrade is
scheduled not until next year. On Fri, Aug 10, 2012 at 7:20 AM, Jerry Lam <[EMAIL PROTECTED]> wrote: > Hi Wei: > > There is a jira Hbase-3996, does this sound something you are looking for? > > Regards, > > Jerry > > On Friday, August 10, 2012, Bryan Beaudreault wrote: > > > Use 3 jobs: 1 to scan each table. The third could do a map-side join. > Make > > sure to use the same sort and partitions on the first two. > > > > Sent from iPhone. > > > > On Aug 10, 2012, at 9:41 AM, Weishung Chung <[EMAIL PROTECTED] > <javascript:;>> > > wrote: > > > > > but they are in production now > > > > > > On Fri, Aug 10, 2012 at 6:39 AM, Weishung Chung <[EMAIL PROTECTED] > <javascript:;>> > > wrote: > > > > > >> Thank you, I am trying to avoid to fetch by gets and would like to do > > >> something like hadoop MultipleInputs. > > >> Yes, it would be nice if i could denormalize and remodel the schema. > > >> > > >> > > >> On Fri, Aug 10, 2012 at 6:29 AM, Amandeep Khurana <[EMAIL PROTECTED] > <javascript:;> > > >wrote: > > >> > > >>> You can scan over one of the tables (using TableInputFormat) and do > > simple > > >>> gets on the other table for every row that you want to join. > > >>> > > >>> An interesting question to address here would be - why even need a > > join. > > >>> Can you talk more about the data and what you are trying to do? In > > general > > >>> you really want to denormalize and not need joins when working with > > HBase > > >>> (or for that matter most NoSQL stores). > > >>> > > >>> On Fri, Aug 10, 2012 at 6:52 PM, Weishung Chung <[EMAIL PROTECTED] > <javascript:;> > > > > > >>> wrote: > > >>> > > >>>> Basically a join of two data sets on the same row key. > > >>>> > > >>>> On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana <[EMAIL PROTECTED] > <javascript:;> > > > > > >>>> wrote: > > >>>> > > >>>>> How do you want to use two tables? Can you explain your algo a bit? > > >>>>> > > >>>>> On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung < > [EMAIL PROTECTED]<javascript:;> > > > > > >>>>> wrote: > > >>>>> > > >>>>>> Hi HBase users, > > >>>>>> > > >>>>>> I need to pull data from 2 HBase tables in a mapreduce job. For 1 > > >>> table > > >>>>>> input, I use TableMapReduceUtil.initTableMapperJob. Is there > another > > >>>>> method > > >>>>>> for multitable inputs ? > > >>>>>> > > >>>>>> Thank you, > > >>>>>> Wei Shung > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > >> > > > |