Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> fuzzy logic through pig programming


Copy link to this message
-
RE: fuzzy logic through pig programming
http://www.slideshare.net/Hadoop_Summit/pig-programming-is-fun (Daniel Dai and Thejas Nair) indicates how to use the nltk library from inside pig.  nltk has methods to compute various string distance functions, including Levenshtein.

William F Dowling
Senior Technologist
Thomson Reuters

-----Original Message-----
From: j.barrett Strausser [mailto:[EMAIL PROTECTED]]
Sent: Thursday, June 27, 2013 2:57 PM
To: [EMAIL PROTECTED]
Subject: Re: fuzzy logic through pig programming

This is called the levenshtein distance. Since it is a metric you would be
responsible for determining the distance two strings could be from one
another and still considered the same.

I'd implement this as UDF taking two strings,s1 and s2 a float that is
between 0 < f < max(len(s1), len(s2))
On Thu, Jun 27, 2013 at 10:18 AM, Harshit Bhargava <
[EMAIL PROTECTED]> wrote:

> Hi,
> I want a fuzzy logic in pig latin language which should match two string
> for
> Example1
> I have two words 'Ramesh' and 'Rahim' I want to check how much percentage
> of the string are equal
> Example2
> If the two words are 'Ramesh' and 'Ramesh' .then  it should give 100%.
> Kindly provide the solution if available.
> Thanks
> Harshit Bhargava
> This communication may be confidential and privileged and the views
> expressed herein may be personal and are not necessarily the views of Shore
> Group Associates, LLC. It is for the exclusive use of the intended
> recipient(s). If you are not the intended recipient(s), please note that
> any distribution, copying or use of this communication or the information
> in it is strictly prohibited. If you have received this communication in
> error, please notify us by email ([EMAIL PROTECTED]) and then
> delete the email and any copies of it. Visit us at
> www.shoregroupassociates.com
>

--
https://github.com/bearrito
@deepbearrito
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB