|
Vishal Kapoor
2011-03-11, 04:08
Ted Dunning
2011-03-11, 06:26
Amandeep Khurana
2011-03-11, 08:44
Lars George
2011-03-11, 16:44
Dave Latham
2011-03-11, 17:23
Vishal Kapoor
2011-03-11, 18:36
Usman Waheed
2011-03-12, 00:25
Bill Graham
2011-03-12, 04:57
Stack
2011-03-12, 05:54
Jesse Daniels
2011-03-13, 20:45
Ted Dunning
2011-03-13, 23:47
|
-
intersection of row idsVishal Kapoor 2011-03-11, 04:08
Friends,
how do I best achieve intersection of sets of row ids suppose I have two tables with similar row ids how can I get the row ids present in one and not in the other? does things get better if I have row ids as values in some qualifier/ qualifier itself? I hope the question is not too confusing... intersection of {1, 2, 3} and {2, 3, 4} is {2, 3}. while {1,2,3} are row ids from a table, {2,3,4} may come from other table as qualifiers in some row. thanks, Vishal
-
Re: intersection of row idsTed Dunning 2011-03-11, 06:26
You mean like write a map-reduce program that joins the key sets and outputs
what you want? On Thu, Mar 10, 2011 at 8:08 PM, Vishal Kapoor <[EMAIL PROTECTED]>wrote: > Friends, > how do I best achieve intersection of sets of row ids > suppose I have two tables with similar row ids > how can I get the row ids present in one and not in the other? > does things get better if I have row ids as values in some qualifier/ > qualifier itself? > I hope the question is not too confusing... > > intersection of {1, 2, 3} and {2, 3, 4} is {2, 3}. > while {1,2,3} are row ids from a table, {2,3,4} may come from other table > as > qualifiers in some row. > > thanks, > Vishal >
-
Re: intersection of row idsAmandeep Khurana 2011-03-11, 08:44
You can scan through one table and see if the other one has those rowids or
not. On Thu, Mar 10, 2011 at 8:08 PM, Vishal Kapoor <[EMAIL PROTECTED]>wrote: > Friends, > how do I best achieve intersection of sets of row ids > suppose I have two tables with similar row ids > how can I get the row ids present in one and not in the other? > does things get better if I have row ids as values in some qualifier/ > qualifier itself? > I hope the question is not too confusing... > > intersection of {1, 2, 3} and {2, 3, 4} is {2, 3}. > while {1,2,3} are row ids from a table, {2,3,4} may come from other table > as > qualifiers in some row. > > thanks, > Vishal >
-
Re: intersection of row idsLars George 2011-03-11, 16:44
Hi,
If you expect a lot of misses with that approach then enable bloom filters on the second table for fast lookups of misses. Lars On Mar 11, 2011, at 9:44, Amandeep Khurana <[EMAIL PROTECTED]> wrote: > You can scan through one table and see if the other one has those rowids or > not. > > On Thu, Mar 10, 2011 at 8:08 PM, Vishal Kapoor > <[EMAIL PROTECTED]>wrote: > >> Friends, >> how do I best achieve intersection of sets of row ids >> suppose I have two tables with similar row ids >> how can I get the row ids present in one and not in the other? >> does things get better if I have row ids as values in some qualifier/ >> qualifier itself? >> I hope the question is not too confusing... >> >> intersection of {1, 2, 3} and {2, 3, 4} is {2, 3}. >> while {1,2,3} are row ids from a table, {2,3,4} may come from other table >> as >> qualifiers in some row. >> >> thanks, >> Vishal >>
-
Re: intersection of row idsDave Latham 2011-03-11, 17:23
If the ordering of the row ids is the same in both tables and both are of
the same order of magnitude of size, I would recommend opening scanners on both tables, then compare the current row in each scanner, and advance whichever scanner is behind. Whenever you hit a match, you output it and advance both scanners. If you need to do it faster, you can move the same approach into a MR job, where you use TableInputFormat for one scanner, and open the other one manually each Mapper. If one table is order of magnitudes smaller than the other, or the rows ids are formatted differently and not ordered the same in each table, then scan the smaller table and issue gets to check for each row in the larger table. Dave On Thu, Mar 10, 2011 at 8:08 PM, Vishal Kapoor <[EMAIL PROTECTED]>wrote: > Friends, > how do I best achieve intersection of sets of row ids > suppose I have two tables with similar row ids > how can I get the row ids present in one and not in the other? > does things get better if I have row ids as values in some qualifier/ > qualifier itself? > I hope the question is not too confusing... > > intersection of {1, 2, 3} and {2, 3, 4} is {2, 3}. > while {1,2,3} are row ids from a table, {2,3,4} may come from other table > as > qualifiers in some row. > > thanks, > Vishal >
-
Re: intersection of row idsVishal Kapoor 2011-03-11, 18:36
Should the Bloom filter be ROW or ROWCOL?
Vishal On Fri, Mar 11, 2011 at 11:44 AM, Lars George <[EMAIL PROTECTED]> wrote: > Hi, > > If you expect a lot of misses with that approach then enable bloom filters > on the second table for fast lookups of misses. > > Lars > > On Mar 11, 2011, at 9:44, Amandeep Khurana <[EMAIL PROTECTED]> wrote: > > > You can scan through one table and see if the other one has those rowids > or > > not. > > > > On Thu, Mar 10, 2011 at 8:08 PM, Vishal Kapoor > > <[EMAIL PROTECTED]>wrote: > > > >> Friends, > >> how do I best achieve intersection of sets of row ids > >> suppose I have two tables with similar row ids > >> how can I get the row ids present in one and not in the other? > >> does things get better if I have row ids as values in some qualifier/ > >> qualifier itself? > >> I hope the question is not too confusing... > >> > >> intersection of {1, 2, 3} and {2, 3, 4} is {2, 3}. > >> while {1,2,3} are row ids from a table, {2,3,4} may come from other > table > >> as > >> qualifiers in some row. > >> > >> thanks, > >> Vishal > >> >
-
Re: intersection of row idsUsman Waheed 2011-03-12, 00:25
I suggest it to be ROWCOL because you have many columns to match against
in your second table (column qualifiers). -Usman > Should the Bloom filter be ROW or ROWCOL? > > Vishal > > On Fri, Mar 11, 2011 at 11:44 AM, Lars George <[EMAIL PROTECTED]> > wrote: > >> Hi, >> >> If you expect a lot of misses with that approach then enable bloom >> filters >> on the second table for fast lookups of misses. >> >> Lars >> >> On Mar 11, 2011, at 9:44, Amandeep Khurana <[EMAIL PROTECTED]> wrote: >> >> > You can scan through one table and see if the other one has those >> rowids >> or >> > not. >> > >> > On Thu, Mar 10, 2011 at 8:08 PM, Vishal Kapoor >> > <[EMAIL PROTECTED]>wrote: >> > >> >> Friends, >> >> how do I best achieve intersection of sets of row ids >> >> suppose I have two tables with similar row ids >> >> how can I get the row ids present in one and not in the other? >> >> does things get better if I have row ids as values in some qualifier/ >> >> qualifier itself? >> >> I hope the question is not too confusing... >> >> >> >> intersection of {1, 2, 3} and {2, 3, 4} is {2, 3}. >> >> while {1,2,3} are row ids from a table, {2,3,4} may come from other >> table >> >> as >> >> qualifiers in some row. >> >> >> >> thanks, >> >> Vishal >> >> >> -- Using Opera's revolutionary email client: http://www.opera.com/mail/
-
Re: intersection of row idsBill Graham 2011-03-12, 04:57
You could also do this with MR easily using Pig's HBaseStorage and
either an inner join or an outer join with a filter on null, depending on if you want matches or misses, respectively. On Fri, Mar 11, 2011 at 4:25 PM, Usman Waheed <[EMAIL PROTECTED]> wrote: > I suggest it to be ROWCOL because you have many columns to match against in > your second table (column qualifiers). > > -Usman > >> Should the Bloom filter be ROW or ROWCOL? >> >> Vishal >> >> On Fri, Mar 11, 2011 at 11:44 AM, Lars George <[EMAIL PROTECTED]> >> wrote: >> >>> Hi, >>> >>> If you expect a lot of misses with that approach then enable bloom >>> filters >>> on the second table for fast lookups of misses. >>> >>> Lars >>> >>> On Mar 11, 2011, at 9:44, Amandeep Khurana <[EMAIL PROTECTED]> wrote: >>> >>> > You can scan through one table and see if the other one has those >>> > rowids >>> or >>> > not. >>> > >>> > On Thu, Mar 10, 2011 at 8:08 PM, Vishal Kapoor >>> > <[EMAIL PROTECTED]>wrote: >>> > >>> >> Friends, >>> >> how do I best achieve intersection of sets of row ids >>> >> suppose I have two tables with similar row ids >>> >> how can I get the row ids present in one and not in the other? >>> >> does things get better if I have row ids as values in some qualifier/ >>> >> qualifier itself? >>> >> I hope the question is not too confusing... >>> >> >>> >> intersection of {1, 2, 3} and {2, 3, 4} is {2, 3}. >>> >> while {1,2,3} are row ids from a table, {2,3,4} may come from other >>> table >>> >> as >>> >> qualifiers in some row. >>> >> >>> >> thanks, >>> >> Vishal >>> >> >>> > > > -- > Using Opera's revolutionary email client: http://www.opera.com/mail/ >
-
Re: intersection of row idsStack 2011-03-12, 05:54
Understand the ROWCOL can use more memory than ROWs. In general,
blooms could soak up a bunch of your RAM. Just be conscious of this fact. St.Ack On Fri, Mar 11, 2011 at 4:25 PM, Usman Waheed <[EMAIL PROTECTED]> wrote: > I suggest it to be ROWCOL because you have many columns to match against in > your second table (column qualifiers). > > -Usman > >> Should the Bloom filter be ROW or ROWCOL? >> >> Vishal >> >> On Fri, Mar 11, 2011 at 11:44 AM, Lars George <[EMAIL PROTECTED]> >> wrote: >> >>> Hi, >>> >>> If you expect a lot of misses with that approach then enable bloom >>> filters >>> on the second table for fast lookups of misses. >>> >>> Lars >>> >>> On Mar 11, 2011, at 9:44, Amandeep Khurana <[EMAIL PROTECTED]> wrote: >>> >>> > You can scan through one table and see if the other one has those >>> > rowids >>> or >>> > not. >>> > >>> > On Thu, Mar 10, 2011 at 8:08 PM, Vishal Kapoor >>> > <[EMAIL PROTECTED]>wrote: >>> > >>> >> Friends, >>> >> how do I best achieve intersection of sets of row ids >>> >> suppose I have two tables with similar row ids >>> >> how can I get the row ids present in one and not in the other? >>> >> does things get better if I have row ids as values in some qualifier/ >>> >> qualifier itself? >>> >> I hope the question is not too confusing... >>> >> >>> >> intersection of {1, 2, 3} and {2, 3, 4} is {2, 3}. >>> >> while {1,2,3} are row ids from a table, {2,3,4} may come from other >>> table >>> >> as >>> >> qualifiers in some row. >>> >> >>> >> thanks, >>> >> Vishal >>> >> >>> > > > -- > Using Opera's revolutionary email client: http://www.opera.com/mail/ >
-
Re: intersection of row idsJesse Daniels 2011-03-13, 20:45
Has anyone tried the "zig-zag" merge join algorithm that Google uses to do
something similar with their AppEngine data store (BigTable)? It's described here starting on slide 29: http://www.scribd.com/doc/16952419/Building-scalable-complex-apps-on-App-Engine <http://www.scribd.com/doc/16952419/Building-scalable-complex-apps-on-App-Engine>It seems like you could use this technique across multiple tables to perform equality tests without building additional indices. Jesse On Fri, Mar 11, 2011 at 9:54 PM, Stack <[EMAIL PROTECTED]> wrote: > Understand the ROWCOL can use more memory than ROWs. In general, > blooms could soak up a bunch of your RAM. Just be conscious of this > fact. > St.Ack > > On Fri, Mar 11, 2011 at 4:25 PM, Usman Waheed <[EMAIL PROTECTED]> wrote: > > I suggest it to be ROWCOL because you have many columns to match against > in > > your second table (column qualifiers). > > > > -Usman > > > >> Should the Bloom filter be ROW or ROWCOL? > >> > >> Vishal > >> > >> On Fri, Mar 11, 2011 at 11:44 AM, Lars George <[EMAIL PROTECTED]> > >> wrote: > >> > >>> Hi, > >>> > >>> If you expect a lot of misses with that approach then enable bloom > >>> filters > >>> on the second table for fast lookups of misses. > >>> > >>> Lars > >>> > >>> On Mar 11, 2011, at 9:44, Amandeep Khurana <[EMAIL PROTECTED]> wrote: > >>> > >>> > You can scan through one table and see if the other one has those > >>> > rowids > >>> or > >>> > not. > >>> > > >>> > On Thu, Mar 10, 2011 at 8:08 PM, Vishal Kapoor > >>> > <[EMAIL PROTECTED]>wrote: > >>> > > >>> >> Friends, > >>> >> how do I best achieve intersection of sets of row ids > >>> >> suppose I have two tables with similar row ids > >>> >> how can I get the row ids present in one and not in the other? > >>> >> does things get better if I have row ids as values in some > qualifier/ > >>> >> qualifier itself? > >>> >> I hope the question is not too confusing... > >>> >> > >>> >> intersection of {1, 2, 3} and {2, 3, 4} is {2, 3}. > >>> >> while {1,2,3} are row ids from a table, {2,3,4} may come from other > >>> table > >>> >> as > >>> >> qualifiers in some row. > >>> >> > >>> >> thanks, > >>> >> Vishal > >>> >> > >>> > > > > > > -- > > Using Opera's revolutionary email client: http://www.opera.com/mail/ > > >
-
Re: intersection of row idsTed Dunning 2011-03-13, 23:47
Well, since you can start iterating from any point, you can just do a
map-reduce over the larger table. In each mapper, on the first call, initialize a scanner into the smaller table to start with the key that you get from the larger table. Each time you get a sequential key from the master table, you can bump the scanner along a bit. On Sun, Mar 13, 2011 at 1:45 PM, Jesse Daniels <[EMAIL PROTECTED]>wrote: > Has anyone tried the "zig-zag" merge join algorithm that Google uses to do > something similar with their AppEngine data store (BigTable)? It's > described > here starting on slide 29: > > http://www.scribd.com/doc/16952419/Building-scalable-complex-apps-on-App-Engine > > < > http://www.scribd.com/doc/16952419/Building-scalable-complex-apps-on-App-Engine > >It > seems like you could use this technique across multiple tables to perform > equality tests without building additional indices. > > Jesse > > On Fri, Mar 11, 2011 at 9:54 PM, Stack <[EMAIL PROTECTED]> wrote: > > > Understand the ROWCOL can use more memory than ROWs. In general, > > blooms could soak up a bunch of your RAM. Just be conscious of this > > fact. > > St.Ack > > > > On Fri, Mar 11, 2011 at 4:25 PM, Usman Waheed <[EMAIL PROTECTED]> wrote: > > > I suggest it to be ROWCOL because you have many columns to match > against > > in > > > your second table (column qualifiers). > > > > > > -Usman > > > > > >> Should the Bloom filter be ROW or ROWCOL? > > >> > > >> Vishal > > >> > > >> On Fri, Mar 11, 2011 at 11:44 AM, Lars George <[EMAIL PROTECTED]> > > >> wrote: > > >> > > >>> Hi, > > >>> > > >>> If you expect a lot of misses with that approach then enable bloom > > >>> filters > > >>> on the second table for fast lookups of misses. > > >>> > > >>> Lars > > >>> > > >>> On Mar 11, 2011, at 9:44, Amandeep Khurana <[EMAIL PROTECTED]> wrote: > > >>> > > >>> > You can scan through one table and see if the other one has those > > >>> > rowids > > >>> or > > >>> > not. > > >>> > > > >>> > On Thu, Mar 10, 2011 at 8:08 PM, Vishal Kapoor > > >>> > <[EMAIL PROTECTED]>wrote: > > >>> > > > >>> >> Friends, > > >>> >> how do I best achieve intersection of sets of row ids > > >>> >> suppose I have two tables with similar row ids > > >>> >> how can I get the row ids present in one and not in the other? > > >>> >> does things get better if I have row ids as values in some > > qualifier/ > > >>> >> qualifier itself? > > >>> >> I hope the question is not too confusing... > > >>> >> > > >>> >> intersection of {1, 2, 3} and {2, 3, 4} is {2, 3}. > > >>> >> while {1,2,3} are row ids from a table, {2,3,4} may come from > other > > >>> table > > >>> >> as > > >>> >> qualifiers in some row. > > >>> >> > > >>> >> thanks, > > >>> >> Vishal > > >>> >> > > >>> > > > > > > > > > -- > > > Using Opera's revolutionary email client: http://www.opera.com/mail/ > > > > > > |