Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> matrix multiplication


Copy link to this message
-
Re: matrix multiplication
Hi
   Great . Thanks alot.
How do I sort the result by score and select top 20 (say)?

On Monday, October 22, 2012, Gunther Hagleitner <[EMAIL PROTECTED]>
wrote:
> This should work:
>
> matrix = load 'data1.txt' using PigStorage(',') as (row:chararray,
> column:chararray, value:float);
> vectors = load 'data2.txt' using PigStorage(',') as (user:chararray,
> column:chararray);
>
> joined = join vectors by column, matrix by column;
> groups = group joined by (user, row);
> result = foreach groups generate group.user, group.row, (float)
> SUM(joined.value);
>
> store result into 'result';
>
> Thanks,
> Gunther.
>
> On Sun, Oct 21, 2012 at 7:40 PM, jamal sasha <[EMAIL PROTECTED]>
wrote:
>
>> Hi,
>>   I am trying to do matrix multiplication using pig.
>>
>> Basically I have data in the form:
>> data1.txt
>> item1,item2,0.3
>> item1, item3, 0.4
>> item1, item5, 0.6
>>
>> And then I another data in the form
>> data2.txt
>> user1,item1
>> user1,item2
>> user1,item5
>> ...
>> user2,item2
>> etc
>>
>> Just to give some context.. I am trying to build a top n recommendation
>> system.. which is as follows.
>> Matrix formed by data2.txt
>>           item1   item2    item3    item4   item5
>> user1   1           1           0          0          1
>>
>>
>> Matrix formed by data1.txt
>>
>>             item1       item2        item 3      item4      item5
>> item1       1            0.3           0.4             0           0.6
>> item2                       1
>> item3                                     1
>> item4                                                      1
>> item5                                                                   1
>>
>>
>> So recommendations for user1 would be whether user1 is the score
>> computation as followed
>> Score for user 1 for item 1 = (ignore item1, item1 score) u12* item_12 +
>> u13*item_13 + u14*item14 + u15*item15
>>
>>                                        >>         1 *0.3        +    0*0.4   +  0*0   + 1 * 0.6 = 0.9
>>
>> And then i find this score for user1 and item2
>>
>> And then for user 2 .. item 1 and so on.
>>
>> I understand this is more of an implementation challenge.. and not sure
>> whether this is the right place to ask this.. But any suggestions will be
>> greatly appreciated.
>> Thanks
>> Jamal
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB