Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # dev >> DataTree

Hi folks,

I was reading DataTree code last night, and I would like to ask some
snippets I didn't understand, so, please, bear with me, and excuse me in
advance if I say something stupid. I am still finding my way through the
code base.

1. createNode()

 480 HashSet<String> list = ephemerals.get(ephemeralOwner); 481 if (list =null) { 482 list = new HashSet<String>();
483ephemerals.put(ephemeralOwner, list);
484 } 485 synchronized (list) { 486 list.add(path); 487 }

The snippet above includes the path in the list of ephemeral nodes, and
creates an entry for that specific ephemeralOwner (lines 481-484), if it
doesn't exists. Suppose that "ephemerals" is initially empty for a given
ephemeralOwner, and you have two threads arriving, at roughly the same
time, at line 481. Both will modify the same entry (ephemeralOwner), that
is empty, so they will create a new List at line 482.

So far, no problem, because "ephemerals" is a ConcurrentMap so the access
will be thread fase, and the last write will win (overriding the previous
one). The first problem is that as each thread created its own "list" then
each one will synchronize on a different "list" reference at line 485
(defeating the purpose of synchronized for this specific case).

Even here there's no problem because each thread is modifying its own copy
of "list", but as one of the was overwritten at "ephemerals" then one of
the paths will be eventually be lost (because its "list" was overwritten
and not more present in "ephemerals"). I don't know if this specific
situation is really feasible or once in a million chance -- I suppose it
can happen  because if the code is synchronizing at line 485 then it says
it's possible that two threads alter the same Set<String> at the same time.
Again, excuse me and ignore this message, if I said something really
stupid. I am still finding my way through the code base. :)

The solution is I came with is very simple:

 480 HashSet<String> list = ephemerals.get(ephemeralOwner); 481 if (list =null) { 482 list = ephemerals.putIfAbsent(ephemeralOwner, new
HashSet<String>()); 483 484 } 485 synchronized (list) { 486 list.add(path);
487 }

The putIfAbsent() method will work as follow (quoted from javadoc):

     If the specified key is not already associated with a value, associate
it with the given value. This is equivalent to

   if (!map.containsKey(key))
      return map.put(key, value);
      return map.get(key);

Now you will always have the two (or more) threads holding references to
the same "list" object, line 485 works nicely, so that they don't loose any
information to be included

2. The code below leaks the internal keySet of "ephemerals" to the outside
world (I looked at ConcurrentHashMap impl and, to my surprise it's not a
copy as I've always thought, it's the  internal Map's keySett !!! ), and
this is potentially dangerous.

 199 public Collection<Long> getSessions() { 200 return
ephemerals.keySet(); 201 }

I suggest to replace by this

 199 public Collection<Long> getSessions() { 200 return
Collections.unmodifiableSet<Long>(ephemerals.keySet()); 201 }

or this:

 199 public Collection<Long> getSessions() { 200 return
Collections.unmodifiableSet<Long>(new HashSet<Long>(ephemerals.keySet()));
201 }

3. Line 408-409 state the following (but they are more lines like this in

 408 * the session id that owns this node. -1 indicates this is not 409 *
an ephemeral node.

But looking at the souce code you see:

 479 if (ephemeralOwner != 0) {

 1198 if (eowner != 0)

 779 createTxn.getEphemeral() ? header.getClientId() : 0,

Ooops, it's not -1, but 0 (zero) that determines a non-ephemeral node. Am I

Thawan Kooburat 2013-02-14, 05:27
Edward Ribeiro 2013-02-14, 15:01