Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> HIVE-4053 | Review request


Copy link to this message
-
HIVE-4053 | Review request
Hi,

I've implemented 'Refined Soundex' algorithm using a GenericUDF and would
like to share it for a review by experts as I'm a newbie.

Change Details:
A new java class is created: GenericUDFRefinedSoundex.java
Add a entry to FunctionRegistry.java: registerGenericUDF("soundex_ref",
GenericUDFRefinedSoundex.class);

Both files are attached to the email.

I'm planning to implement other phonetic algorithms and submit all as a
single patch. I understand there are many other steps that I need to finish
before a patch is ready but for now, if you could review the attached code
and provide feedback, it'll be great.

Here are the details of Refined Soundex algorithm:
First letter is stored
Subsequent letters are replaced by numbers as defined below-
 * B, P => 1
 * F, V => 2
 * C, K, S => 3
 * G, J => 4
 * Q, X, Z => 5
 * D, T => 6
 * L => 7
 * M, N => 8
 * R => 9
 * Other letters => 0
Consecutive letters belonging to the same group are replaced by one letter

Example:
> SELECT soundex_ref('Carren') FROM src LIMIT 1;
> C30908

Thanks,
Krishna
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB