Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> matrix multiplication


Copy link to this message
-
Re: matrix multiplication
Hi
Thanks for reply .
But how do I sort this for each user group instead of the entire list by
score.
And then for each user group I want to have top 20 rather than selecting
top 20 from the whole list
Any ideas :(
Thanks

On Monday, October 22, 2012, Gunther Hagleitner <[EMAIL PROTECTED]>
wrote:
> That's fairly straightforward. Take a look at:
> http://pig.apache.org/docs/r0.10.0/basic.html (order by, limit).
>
> Thanks,
> Gunther.
>
> On Mon, Oct 22, 2012 at 7:12 AM, jamal sasha <[EMAIL PROTECTED]>
wrote:
>
>> Hi
>>    Great . Thanks alot.
>> How do I sort the result by score and select top 20 (say)?
>>
>> On Monday, October 22, 2012, Gunther Hagleitner <
>> [EMAIL PROTECTED]>
>> wrote:
>> > This should work:
>> >
>> > matrix = load 'data1.txt' using PigStorage(',') as (row:chararray,
>> > column:chararray, value:float);
>> > vectors = load 'data2.txt' using PigStorage(',') as (user:chararray,
>> > column:chararray);
>> >
>> > joined = join vectors by column, matrix by column;
>> > groups = group joined by (user, row);
>> > result = foreach groups generate group.user, group.row, (float)
>> > SUM(joined.value);
>> >
>> > store result into 'result';
>> >
>> > Thanks,
>> > Gunther.
>> >
>> > On Sun, Oct 21, 2012 at 7:40 PM, jamal sasha <[EMAIL PROTECTED]>
>> wrote:
>> >
>> >> Hi,
>> >>   I am trying to do matrix multiplication using pig.
>> >>
>> >> Basically I have data in the form:
>> >> data1.txt
>> >> item1,item2,0.3
>> >> item1, item3, 0.4
>> >> item1, item5, 0.6
>> >>
>> >> And then I another data in the form
>> >> data2.txt
>> >> user1,item1
>> >> user1,item2
>> >> user1,item5
>> >> ...
>> >> user2,item2
>> >> etc
>> >>
>> >> Just to give some context.. I am trying to build a top n
recommendation
>> >> system.. which is as follows.
>> >> Matrix formed by data2.txt
>> >>           item1   item2    item3    item4   item5
>> >> user1   1           1           0          0          1
>> >>
>> >>
>> >> Matrix formed by data1.txt
>> >>
>> >>             item1       item2        item 3      item4      item5
>> >> item1       1            0.3           0.4             0           0.6
>> >> item2                       1
>> >> item3                                     1
>> >> item4                                                      1
>> >> item5
>> 1
>> >>
>> >>
>> >> So recommendations for user1 would be whether user1 is the score
>> >> computation as followed
>> >> Score for user 1 for item 1 = (ignore item1, item1 score) u12*
item_12 +
>> >> u13*item_13 + u14*item14 + u15*item15
>> >>
>> >>                                        >> >>         1 *0.3        +    0*0.4   +  0*0   + 1 * 0.6 = 0.9
>> >>
>> >> And then i find this score for user1 and item2
>> >>
>> >> And then for user 2 .. item 1 and so on.
>> >>
>> >> I understand this is more of an implementation challenge.. and not
sure
>> >> whether this is the right place to ask this.. But any suggestions will
>> be
>> >> greatly appreciated.
>> >> Thanks
>> >> Jamal
>> >>
>> >
>>
>