Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - matrix multiplication


Copy link to this message
-
Re: matrix multiplication
jamal sasha 2012-10-22, 14:12
Hi
   Great . Thanks alot.
How do I sort the result by score and select top 20 (say)?

On Monday, October 22, 2012, Gunther Hagleitner <[EMAIL PROTECTED]>
wrote:
> This should work:
>
> matrix = load 'data1.txt' using PigStorage(',') as (row:chararray,
> column:chararray, value:float);
> vectors = load 'data2.txt' using PigStorage(',') as (user:chararray,
> column:chararray);
>
> joined = join vectors by column, matrix by column;
> groups = group joined by (user, row);
> result = foreach groups generate group.user, group.row, (float)
> SUM(joined.value);
>
> store result into 'result';
>
> Thanks,
> Gunther.
>
> On Sun, Oct 21, 2012 at 7:40 PM, jamal sasha <[EMAIL PROTECTED]>
wrote:
>
>> Hi,
>>   I am trying to do matrix multiplication using pig.
>>
>> Basically I have data in the form:
>> data1.txt
>> item1,item2,0.3
>> item1, item3, 0.4
>> item1, item5, 0.6
>>
>> And then I another data in the form
>> data2.txt
>> user1,item1
>> user1,item2
>> user1,item5
>> ...
>> user2,item2
>> etc
>>
>> Just to give some context.. I am trying to build a top n recommendation
>> system.. which is as follows.
>> Matrix formed by data2.txt
>>           item1   item2    item3    item4   item5
>> user1   1           1           0          0          1
>>
>>
>> Matrix formed by data1.txt
>>
>>             item1       item2        item 3      item4      item5
>> item1       1            0.3           0.4             0           0.6
>> item2                       1
>> item3                                     1
>> item4                                                      1
>> item5                                                                   1
>>
>>
>> So recommendations for user1 would be whether user1 is the score
>> computation as followed
>> Score for user 1 for item 1 = (ignore item1, item1 score) u12* item_12 +
>> u13*item_13 + u14*item14 + u15*item15
>>
>>                                        >>         1 *0.3        +    0*0.4   +  0*0   + 1 * 0.6 = 0.9
>>
>> And then i find this score for user1 and item2
>>
>> And then for user 2 .. item 1 and so on.
>>
>> I understand this is more of an implementation challenge.. and not sure
>> whether this is the right place to ask this.. But any suggestions will be
>> greatly appreciated.
>> Thanks
>> Jamal
>>
>