|
|
jamal sasha 2012-10-22, 02:40
Hi, I am trying to do matrix multiplication using pig.
Basically I have data in the form: data1.txt item1,item2,0.3 item1, item3, 0.4 item1, item5, 0.6
And then I another data in the form data2.txt user1,item1 user1,item2 user1,item5 ... user2,item2 etc
Just to give some context.. I am trying to build a top n recommendation system.. which is as follows. Matrix formed by data2.txt item1 item2 item3 item4 item5 user1 1 1 0 0 1 Matrix formed by data1.txt
item1 item2 item 3 item4 item5 item1 1 0.3 0.4 0 0.6 item2 1 item3 1 item4 1 item5 1 So recommendations for user1 would be whether user1 is the score computation as followed Score for user 1 for item 1 = (ignore item1, item1 score) u12* item_12 + u13*item_13 + u14*item14 + u15*item15
1 *0.3 + 0*0.4 + 0*0 + 1 * 0.6 = 0.9
And then i find this score for user1 and item2
And then for user 2 .. item 1 and so on.
I understand this is more of an implementation challenge.. and not sure whether this is the right place to ask this.. But any suggestions will be greatly appreciated. Thanks Jamal
-
Re: matrix multiplication
Gunther Hagleitner 2012-10-22, 04:42
This should work:
matrix = load 'data1.txt' using PigStorage(',') as (row:chararray, column:chararray, value:float); vectors = load 'data2.txt' using PigStorage(',') as (user:chararray, column:chararray);
joined = join vectors by column, matrix by column; groups = group joined by (user, row); result = foreach groups generate group.user, group.row, (float) SUM(joined.value);
store result into 'result';
Thanks, Gunther.
On Sun, Oct 21, 2012 at 7:40 PM, jamal sasha <[EMAIL PROTECTED]> wrote:
> Hi, > I am trying to do matrix multiplication using pig. > > Basically I have data in the form: > data1.txt > item1,item2,0.3 > item1, item3, 0.4 > item1, item5, 0.6 > > And then I another data in the form > data2.txt > user1,item1 > user1,item2 > user1,item5 > ... > user2,item2 > etc > > Just to give some context.. I am trying to build a top n recommendation > system.. which is as follows. > Matrix formed by data2.txt > item1 item2 item3 item4 item5 > user1 1 1 0 0 1 > > > Matrix formed by data1.txt > > item1 item2 item 3 item4 item5 > item1 1 0.3 0.4 0 0.6 > item2 1 > item3 1 > item4 1 > item5 1 > > > So recommendations for user1 would be whether user1 is the score > computation as followed > Score for user 1 for item 1 = (ignore item1, item1 score) u12* item_12 + > u13*item_13 + u14*item14 + u15*item15 > > > 1 *0.3 + 0*0.4 + 0*0 + 1 * 0.6 = 0.9 > > And then i find this score for user1 and item2 > > And then for user 2 .. item 1 and so on. > > I understand this is more of an implementation challenge.. and not sure > whether this is the right place to ask this.. But any suggestions will be > greatly appreciated. > Thanks > Jamal >
-
Re: matrix multiplication
jamal sasha 2012-10-22, 14:12
Hi Great . Thanks alot. How do I sort the result by score and select top 20 (say)?
On Monday, October 22, 2012, Gunther Hagleitner <[EMAIL PROTECTED]> wrote: > This should work: > > matrix = load 'data1.txt' using PigStorage(',') as (row:chararray, > column:chararray, value:float); > vectors = load 'data2.txt' using PigStorage(',') as (user:chararray, > column:chararray); > > joined = join vectors by column, matrix by column; > groups = group joined by (user, row); > result = foreach groups generate group.user, group.row, (float) > SUM(joined.value); > > store result into 'result'; > > Thanks, > Gunther. > > On Sun, Oct 21, 2012 at 7:40 PM, jamal sasha <[EMAIL PROTECTED]> wrote: > >> Hi, >> I am trying to do matrix multiplication using pig. >> >> Basically I have data in the form: >> data1.txt >> item1,item2,0.3 >> item1, item3, 0.4 >> item1, item5, 0.6 >> >> And then I another data in the form >> data2.txt >> user1,item1 >> user1,item2 >> user1,item5 >> ... >> user2,item2 >> etc >> >> Just to give some context.. I am trying to build a top n recommendation >> system.. which is as follows. >> Matrix formed by data2.txt >> item1 item2 item3 item4 item5 >> user1 1 1 0 0 1 >> >> >> Matrix formed by data1.txt >> >> item1 item2 item 3 item4 item5 >> item1 1 0.3 0.4 0 0.6 >> item2 1 >> item3 1 >> item4 1 >> item5 1 >> >> >> So recommendations for user1 would be whether user1 is the score >> computation as followed >> Score for user 1 for item 1 = (ignore item1, item1 score) u12* item_12 + >> u13*item_13 + u14*item14 + u15*item15 >> >> >> 1 *0.3 + 0*0.4 + 0*0 + 1 * 0.6 = 0.9 >> >> And then i find this score for user1 and item2 >> >> And then for user 2 .. item 1 and so on. >> >> I understand this is more of an implementation challenge.. and not sure >> whether this is the right place to ask this.. But any suggestions will be >> greatly appreciated. >> Thanks >> Jamal >> >
-
Re: matrix multiplication
Gunther Hagleitner 2012-10-22, 17:02
That's fairly straightforward. Take a look at: http://pig.apache.org/docs/r0.10.0/basic.html (order by, limit). Thanks, Gunther. On Mon, Oct 22, 2012 at 7:12 AM, jamal sasha <[EMAIL PROTECTED]> wrote: > Hi > Great . Thanks alot. > How do I sort the result by score and select top 20 (say)? > > On Monday, October 22, 2012, Gunther Hagleitner < > [EMAIL PROTECTED]> > wrote: > > This should work: > > > > matrix = load 'data1.txt' using PigStorage(',') as (row:chararray, > > column:chararray, value:float); > > vectors = load 'data2.txt' using PigStorage(',') as (user:chararray, > > column:chararray); > > > > joined = join vectors by column, matrix by column; > > groups = group joined by (user, row); > > result = foreach groups generate group.user, group.row, (float) > > SUM(joined.value); > > > > store result into 'result'; > > > > Thanks, > > Gunther. > > > > On Sun, Oct 21, 2012 at 7:40 PM, jamal sasha <[EMAIL PROTECTED]> > wrote: > > > >> Hi, > >> I am trying to do matrix multiplication using pig. > >> > >> Basically I have data in the form: > >> data1.txt > >> item1,item2,0.3 > >> item1, item3, 0.4 > >> item1, item5, 0.6 > >> > >> And then I another data in the form > >> data2.txt > >> user1,item1 > >> user1,item2 > >> user1,item5 > >> ... > >> user2,item2 > >> etc > >> > >> Just to give some context.. I am trying to build a top n recommendation > >> system.. which is as follows. > >> Matrix formed by data2.txt > >> item1 item2 item3 item4 item5 > >> user1 1 1 0 0 1 > >> > >> > >> Matrix formed by data1.txt > >> > >> item1 item2 item 3 item4 item5 > >> item1 1 0.3 0.4 0 0.6 > >> item2 1 > >> item3 1 > >> item4 1 > >> item5 > 1 > >> > >> > >> So recommendations for user1 would be whether user1 is the score > >> computation as followed > >> Score for user 1 for item 1 = (ignore item1, item1 score) u12* item_12 + > >> u13*item_13 + u14*item14 + u15*item15 > >> > >> > >> 1 *0.3 + 0*0.4 + 0*0 + 1 * 0.6 = 0.9 > >> > >> And then i find this score for user1 and item2 > >> > >> And then for user 2 .. item 1 and so on. > >> > >> I understand this is more of an implementation challenge.. and not sure > >> whether this is the right place to ask this.. But any suggestions will > be > >> greatly appreciated. > >> Thanks > >> Jamal > >> > > >
-
Re: matrix multiplication
jamal sasha 2012-10-22, 17:20
Hi Thanks for reply . But how do I sort this for each user group instead of the entire list by score. And then for each user group I want to have top 20 rather than selecting top 20 from the whole list Any ideas :( Thanks On Monday, October 22, 2012, Gunther Hagleitner <[EMAIL PROTECTED]> wrote: > That's fairly straightforward. Take a look at: > http://pig.apache.org/docs/r0.10.0/basic.html (order by, limit). > > Thanks, > Gunther. > > On Mon, Oct 22, 2012 at 7:12 AM, jamal sasha <[EMAIL PROTECTED]> wrote: > >> Hi >> Great . Thanks alot. >> How do I sort the result by score and select top 20 (say)? >> >> On Monday, October 22, 2012, Gunther Hagleitner < >> [EMAIL PROTECTED]> >> wrote: >> > This should work: >> > >> > matrix = load 'data1.txt' using PigStorage(',') as (row:chararray, >> > column:chararray, value:float); >> > vectors = load 'data2.txt' using PigStorage(',') as (user:chararray, >> > column:chararray); >> > >> > joined = join vectors by column, matrix by column; >> > groups = group joined by (user, row); >> > result = foreach groups generate group.user, group.row, (float) >> > SUM(joined.value); >> > >> > store result into 'result'; >> > >> > Thanks, >> > Gunther. >> > >> > On Sun, Oct 21, 2012 at 7:40 PM, jamal sasha <[EMAIL PROTECTED]> >> wrote: >> > >> >> Hi, >> >> I am trying to do matrix multiplication using pig. >> >> >> >> Basically I have data in the form: >> >> data1.txt >> >> item1,item2,0.3 >> >> item1, item3, 0.4 >> >> item1, item5, 0.6 >> >> >> >> And then I another data in the form >> >> data2.txt >> >> user1,item1 >> >> user1,item2 >> >> user1,item5 >> >> ... >> >> user2,item2 >> >> etc >> >> >> >> Just to give some context.. I am trying to build a top n recommendation >> >> system.. which is as follows. >> >> Matrix formed by data2.txt >> >> item1 item2 item3 item4 item5 >> >> user1 1 1 0 0 1 >> >> >> >> >> >> Matrix formed by data1.txt >> >> >> >> item1 item2 item 3 item4 item5 >> >> item1 1 0.3 0.4 0 0.6 >> >> item2 1 >> >> item3 1 >> >> item4 1 >> >> item5 >> 1 >> >> >> >> >> >> So recommendations for user1 would be whether user1 is the score >> >> computation as followed >> >> Score for user 1 for item 1 = (ignore item1, item1 score) u12* item_12 + >> >> u13*item_13 + u14*item14 + u15*item15 >> >> >> >> >> >> 1 *0.3 + 0*0.4 + 0*0 + 1 * 0.6 = 0.9 >> >> >> >> And then i find this score for user1 and item2 >> >> >> >> And then for user 2 .. item 1 and so on. >> >> >> >> I understand this is more of an implementation challenge.. and not sure >> >> whether this is the right place to ask this.. But any suggestions will >> be >> >> greatly appreciated. >> >> Thanks >> >> Jamal >> >> >> > >> >
-
Re: matrix multiplication
Gunther Hagleitner 2012-10-22, 22:44
Search for 'nested foreach' statements in the link I sent. You can use ORDER BY and LIMIT within these statements and I think that's what you're looking for. Thanks, Gunther. On Mon, Oct 22, 2012 at 10:20 AM, jamal sasha <[EMAIL PROTECTED]> wrote: > Hi > Thanks for reply . > But how do I sort this for each user group instead of the entire list by > score. > And then for each user group I want to have top 20 rather than selecting > top 20 from the whole list > Any ideas :( > Thanks > > On Monday, October 22, 2012, Gunther Hagleitner < > [EMAIL PROTECTED]> > wrote: > > That's fairly straightforward. Take a look at: > > http://pig.apache.org/docs/r0.10.0/basic.html (order by, limit). > > > > Thanks, > > Gunther. > > > > On Mon, Oct 22, 2012 at 7:12 AM, jamal sasha <[EMAIL PROTECTED]> > wrote: > > > >> Hi > >> Great . Thanks alot. > >> How do I sort the result by score and select top 20 (say)? > >> > >> On Monday, October 22, 2012, Gunther Hagleitner < > >> [EMAIL PROTECTED]> > >> wrote: > >> > This should work: > >> > > >> > matrix = load 'data1.txt' using PigStorage(',') as (row:chararray, > >> > column:chararray, value:float); > >> > vectors = load 'data2.txt' using PigStorage(',') as (user:chararray, > >> > column:chararray); > >> > > >> > joined = join vectors by column, matrix by column; > >> > groups = group joined by (user, row); > >> > result = foreach groups generate group.user, group.row, (float) > >> > SUM(joined.value); > >> > > >> > store result into 'result'; > >> > > >> > Thanks, > >> > Gunther. > >> > > >> > On Sun, Oct 21, 2012 at 7:40 PM, jamal sasha <[EMAIL PROTECTED]> > >> wrote: > >> > > >> >> Hi, > >> >> I am trying to do matrix multiplication using pig. > >> >> > >> >> Basically I have data in the form: > >> >> data1.txt > >> >> item1,item2,0.3 > >> >> item1, item3, 0.4 > >> >> item1, item5, 0.6 > >> >> > >> >> And then I another data in the form > >> >> data2.txt > >> >> user1,item1 > >> >> user1,item2 > >> >> user1,item5 > >> >> ... > >> >> user2,item2 > >> >> etc > >> >> > >> >> Just to give some context.. I am trying to build a top n > recommendation > >> >> system.. which is as follows. > >> >> Matrix formed by data2.txt > >> >> item1 item2 item3 item4 item5 > >> >> user1 1 1 0 0 1 > >> >> > >> >> > >> >> Matrix formed by data1.txt > >> >> > >> >> item1 item2 item 3 item4 item5 > >> >> item1 1 0.3 0.4 0 > 0.6 > >> >> item2 1 > >> >> item3 1 > >> >> item4 1 > >> >> item5 > >> 1 > >> >> > >> >> > >> >> So recommendations for user1 would be whether user1 is the score > >> >> computation as followed > >> >> Score for user 1 for item 1 = (ignore item1, item1 score) u12* > item_12 + > >> >> u13*item_13 + u14*item14 + u15*item15 > >> >> > >> >> > >> >> 1 *0.3 + 0*0.4 + 0*0 + 1 * 0.6 = 0.9 > >> >> > >> >> And then i find this score for user1 and item2 > >> >> > >> >> And then for user 2 .. item 1 and so on. > >> >> > >> >> I understand this is more of an implementation challenge.. and not > sure > >> >> whether this is the right place to ask this.. But any suggestions > will > >> be > >> >> greatly appreciated. > >> >> Thanks > >> >> Jamal > >> >> > >> > > >> > > >
|
|