Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> Can ZK be used for my use case?

Copy link to this message
RE: Can ZK be used for my use case?
Hi Tavi,

If I understood the usecase correctly, you have different types of files which is present on a FTP server. There are 300 clients which would be interested on the file modifications and watching these files, when there is a version change they would access those files and will act upon.

If this is the case, you can create set of zNodes which will be representing each files like /parentNode/file1, /parentNode/file2, /parentNode/file3... etc.
Interested Clients can add DataWatchers to these files and write your client side logic on recieving the watches notification.
For example,
Client1 -> /parentNode/file1, /parentNode/file2
Client2 -> /parentNode/file2, /parentNode/file3
Client3 -> /parentNode/file4, /parentNode/file1

Say Client1 wants to modify /parentNode/file2, first he should acquire a lock(please see distributed lock recipe using zookeeper) on the file and after modification should add metadata (host:port, or any other unique key to get the trace, who has done the changes) on the zNode /parentNode/file2 and release the lock. After updating the metadata of /parentNode/file2, Client1 and Client3 would get the data watches notification and can act accordingly.

All your clients can add child watches to the /parentNode to know the file addition/creation on the FTPServer and decide whether to add DataWatchers or not.

Always keep this in mind: Default zNode data size is 1MB and recommended to keep lesser data on zNode(its tunable/configurable parameter, user can decide).
ZooKeeper is designed as a high read, high throughput system for small data. It is not designed as a large data store to hold very large data values. As such this 1MB value is a default config option and can be overridden.  It is not advised to do so - but increasing the size a little bit will probably not damage your system (it all depends on your unique access patterns and these changes should be made with care and at your own risk).

Also, network fluctuations could affect the watch notifications, there could be high chance of missing watch notification when it involves network fluctuations and should have better handling of ZooKeeper connection events.

BTW, How many files would be present in the FTPServer?.  Is 300 clients fixed always or dynamically grows?

What if, the client missed one of the version change notification and would like to know the frequency of changing the same file again and again?


-----Original Message-----
From: Martin Kou [mailto:[EMAIL PROTECTED]]
Sent: 06 September 2013 09:23
Subject: Re: Can ZK be used for my use case?

How large are your files? ZooKeeper is generally not designed for large size storage. Also, it doesn't provide guarantees of watchers being called when network outage is involved.

Sent from my iPhone

> On Sep 5, 2013, at 7:36 AM, Tavi <[EMAIL PROTECTED]> wrote:
> Hi everyone,
> I have a web application who generates different types of files (at
> different version) on the server.  Those files must be transferred at
> 300 clients (watchers) and, after transfer, each client must modify
> every file for his purpose.  The "client" is a simple java stand alone
> application installed on my user PC.
> Today my users must download manually their files from an FTP server
> and they must use a java library to convert their files.  Of course, I
> don't know if they do this when the file version changes, I don't have
> any trace who did it ...
> My idea is to use a ZK server with 300 clients, each client will be
> monitoring a specific namespace for changes (for example : client 1
> will monitor /app1 and app2, client 2 -> app 2 only ...).  Every
> namespace will contain the name of the file, his location, his version
> ... When the file version will change, the java client will use the
> information to retrieve and to modify the file from my central FTP.
> "My leader" needs to know which user is connected and he must