Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # dev >> HIVE-4053 | Review request

Copy link to this message
HIVE-4053 | Review request

I've implemented 'Refined Soundex' algorithm using a GenericUDF and would
like to share it for a review by experts as I'm a newbie.

Change Details:
A new java class is created: GenericUDFRefinedSoundex.java
Add a entry to FunctionRegistry.java: registerGenericUDF("soundex_ref",

Both files are attached to the email.

I'm planning to implement other phonetic algorithms and submit all as a
single patch. I understand there are many other steps that I need to finish
before a patch is ready but for now, if you could review the attached code
and provide feedback, it'll be great.

Here are the details of Refined Soundex algorithm:
First letter is stored
Subsequent letters are replaced by numbers as defined below-
 * B, P => 1
 * F, V => 2
 * C, K, S => 3
 * G, J => 4
 * Q, X, Z => 5
 * D, T => 6
 * L => 7
 * M, N => 8
 * R => 9
 * Other letters => 0
Consecutive letters belonging to the same group are replaced by one letter

> SELECT soundex_ref('Carren') FROM src LIMIT 1;
> C30908

Mark Grover 2013-02-23, 23:14