Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # user >> Sanitizing ZooKeeper znode names


+
David Nickerson 2012-07-06, 16:10
Copy link to this message
-
Re: Sanitizing ZooKeeper znode names
I like to use URL encoding. Then I can use the JDK's UrlEncoder.

===================Jordan Zimmerman

On Jul 6, 2012, at 9:11 AM, David Nickerson
<[EMAIL PROTECTED]> wrote:

> I'm writing a distributed locking API based on ZooKeeper. I create nodes
> based on the resource names, but I have no control over what the client
> chooses as their resource name. (Quite often the client uses linux file
> paths, so I have to remove or escape all of the front slashes.)
>
> To clean the node names, I wrote a method that escapes the bad characters.
> The method is called 'normalize': http://pastebin.com/hakkb9Nw .
>
> For example, a front slash becomes \x2f. This method works, but it has a
> few drawbacks. It doesn't deal with unicode characters greater then 16 bits
> in size, and it's impossible to reverse the escape process. Also,
> crucially, it is possible that two different resources will result in the
> same znode name, which could cause all kinds of trouble.
>
> A more reliable approach would be to convert the resource name into hex.
> For example:
>
> import javax.xml.bind.DatatypeConverter;
>
> DatatypeConverter.printHexBinary(string.getBytes())
>
> This would always result in a safe and unique node name. (It will never
> result in the token "zookeeper" because "zookeeper" has an odd number of
> characters.) The only problem with this is that it becomes impossible to
> read and understand the resource names from ZooKeeper unless you reverse
> the process:
>
> new String(DatatypeConverter.parseHexBinary(hex))
>
> So I'm wondering, is there a standard or recommended practice for
> sanitizing znode names? If not, which approach would you recommend?