Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # dev >> RFC: Behavior of QuotaExceededException

Thawan Kooburat 2013-02-27, 00:41
Flavio Junqueira 2013-02-27, 08:10
Thawan Kooburat 2013-02-27, 19:12
Camille Fournier 2013-03-01, 17:15
Copy link to this message
Re: RFC: Behavior of QuotaExceededException
Here is how we plan to use quota in our environment.

Currently we use ACL to lock root directory. Each team will have to made a
request through us to setup its project folder. Eventually, we will
enforce authentication but currently we assume that everybody will write
to their own project folder

The existing quota feature only allow soft-limit (log warning message) and
do per-folder resource tracking. We expanded resource tracking so that we
can track read/write per-folder and export that information to our
external monitoring system. This allow us to setup per-folder alert and
easily keep track of each team usage.  We want to expand this to allow
hard-limit enforcement too.

We get to review most of the client's application logic to make sure that
they use it properly, so we are not trying to be protected against
malicious/improper use cases.  However, the problem that we are trying to
solve is that client requests or usage may spike due to other failure. Eg.
the application see unexpected workload spike. Or clean up process wasn't
running so data start to grow too large.  Some services are more important
that the others, so we want to make sure that we have enough capacity for
these critical services under those scenarios.

So far I think expiring the offending session is not ideal, but it should
be able to do the job. Let me know if you have any suggestion.

Thawan Kooburat

On 3/1/13 9:15 AM, "Camille Fournier" <[EMAIL PROTECTED]> wrote:

>This is fundamentally one of the huge problems with running "ZooKeeper as
>service", I would be very interested to see how you get around it. ZK is
>sensitive to client (ill)behavior, as I'm sure you're experiencing, that
>it's really difficult to provide it as a service.
>Do you guys control the ZK clients? One thing that I did to get around
>problems like this was actually wrapping the client library into my own,
>more restrictive client. There's not a great way to ensure that the only
>people connecting to your ZK are actually running your client (we need an
>API key or something) but if you're actually wrapping the way people work
>with ZK you can solve a reasonable subset of problems. In fact, when I was
>running a "ZooKeeper as a Service" centralized system, the only clients
>that ever caused me problems were from a group of perl developers that
>weren't using a client my team had provided and thus totally misused the
>system. This would at least make your second two problems (requests/sec
>update bytes/sec) less likely to happen.
>For the issue of node count and used bytes, those are currently set on
>subtrees aren't they? So it doesn't just affect the client connected but
>any client using that subtree? I don't know how anyone can sanely reason
>about a size/byte limit on a node/subtree that someone else is crapping
>over, unless it's tied to ACLs or something. How are you getting around
>On Wed, Feb 27, 2013 at 2:12 PM, Thawan Kooburat <[EMAIL PROTECTED]> wrote:
>> On 2/27/13 12:10 AM, "Flavio Junqueira" <[EMAIL PROTECTED]> wrote:
>> >It wouldn't be very nice to allow holes in the sequence of operations
>> >a client, it would violate session semantics. I'm also wondering about
>> >couple of things:
>> >
>> >- What does QuotaExceedException convey to the application? That the
>> >application client won't ever be able to send operations again with
>> >session? That it won't be able to submit new operations for up to x
>> >amount of time, where x is computed somehow? Expiring the session will
>> >have the side-effect that all the ephemeral nodes will be gone, I'm not
>> >sure that's desirable, but as a punishment it might work out fine. ;-)
>> My initial plan is support 4 types of hard limits ( node count, used
>> bytes, requests/sec and update bytes/sec).  For the first 2 types of
>> limits, it is likely that client won't be able to complete any operation
>> after quota is exceeded. For the last two, after some amount of time,