-Re: regexp_replace with unicode chars
Dean Wampler 2013-03-01, 17:31
Anyone know if translate takes ranges, like some implementations? e.g.,
translate ('[a-z]', '[A-Z]')
Of course, that probably doesn't work for non-ascii characters.
On Fri, Mar 1, 2013 at 11:24 AM, Tom Hall <[EMAIL PROTECTED]> wrote:
> Thanks Dean,
> I dont think translate would work as the set of things to remove is
> Yeah, it's a one-off cleanup job while exporting to try redshift on our
> My guess is it's something about the way hive handles strings? Tried
> "\\ufffd" as the replacement str but no joy either.
> Cheers again,
> On 1 March 2013 17:08, Dean Wampler <[EMAIL PROTECTED]>wrote:
>> I think this should work, but you might investigate using the translate
>> function instead. I suspect it will provide much better performance than
>> using regexps. Also, Are you planning to do this once to create your final
>> tables? If so, the performance overhead won't matter much.
>> On Fri, Mar 1, 2013 at 10:52 AM, Tom Hall <[EMAIL PROTECTED]>wrote:
>>> I would like to remove unicode chars that are outside the Basic
>>> Multilingual Plane 
>>> I thought
>>> select regexp_replace(some_column,"[^\\u0000-\\uffff]","\ufffd") from
>>> would work but while the regexp does work the replacement str does not
>>> (I can paste in the literal �, which you may or may not be able to see here
>>> but it somehow did not fell right)
>>> I saw Deans previous post on using octals  but I think \ufffd is
>>> outside the allowable range.
>> *Dean Wampler, Ph.D.*
*Dean Wampler, Ph.D.*