Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> comparing two files using pig

Copy link to this message
comparing two files using pig

I have a problem statement where in I have to compare two files and get the count of matching attributes.

For ex:
File 1:  file1.txt

q1           d1
q1           d2
q2           d3
q2           d1

File 2: file2.txt
q1           d1
q1           d2
q3           d3

Now what I need is for each distinct q  the count of matching d's

For ex, the output should be
q1           2  (q1     d1 and q1            d2 are matching in both the files hence count is 2)
q2           0 (has no d's matching)
q3           0

Any idea how this can be achieved?

Thnx in advance


=========This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.