Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Chukwa, mail # dev - Cluster-specific Adaptors


Copy link to this message
-
Re: Cluster-specific Adaptors
Bill Graham 2010-09-21, 19:55
Keeping state for a use case like this adds unnecessary complexity in
both the implementation of the client/server and the protocol. Also,
the protocol would be state-full in this case not because doing so
would make it easier to use protocol, but to work around the limits of
the existing protocol, which really could just be changed.

>  It's easy to know when the state can be discarded

I disagree. If I make two HTTP requests the underlying HTTP
client/browser could use one or more connections for example and it
puts an unnecessary burden on the client to track this. You'd instead
need to keep a token to identify each user to bind their state to,
which would need to be passed around, which is a pain. Then how does
he server and the client know when this state is expiring? What if the
server gets bounced? Race conditions? Etc. Yes, there are technical
solutions to all of these things, but they all come at a way more
complex cost to implement and use than I think we need to handle
per-adaptor tags.
On Tue, Sep 21, 2010 at 11:54 AM, Ariel Rabkin <[EMAIL PROTECTED]> wrote:
> Why is it bad to keep some state per command session?  It's easy to
> know when the state can be discarded -- as soon as we're done reading
> the file or when the socket closes.
>
> I think it's fairly intuitive and readable; it's routine for e.g.,
> scripts to modify interpreter state.  In straight-line code, which is
> all we're ever going to have in the control protocol, this is very
> easy to reason about.
>
> --Ari
>
> On Tue, Sep 21, 2010 at 11:17 AM, Bill Graham <[EMAIL PROTECTED]> wrote:
>> +1 on staying stateless.
>>
>> I think the challenge we're facing is that we're trying to support a
>> syntax that is simple and readable and can be done with a single line
>> (i.e. for the initial_adaptors file, the telnet API, the command line,
>> etc), but the configs can potentially be not-so-simple.
>>
>> For example, here's how you might configure the JMS adaptor which used
>> dependency injection. That's a lot for a single line and there's
>> nowhere to add new global configs in front of the adaptor specific
>> configs without breaking things.
>>
>> add jms.JMSAdaptor jms-events
>> failover:(tcp://jms-host.foo.com:61616,tcp://jms-host.foo.com:61616)
>> -q some.queue.name -s "id_type IN ('162')" -x
>> org.apache.hadoop.chukwa.datacollection.adaptor.
>> jms.JMSMessagePropertyTransformer -p
>> "event_time,id_type,id,srcurl,xref,xrq,title -r event_time,id_type,id"
>> 0
>>
>> What if we were to adopt a few flags into the syntax:
>>
>> add [name =] <adaptor_class_name> <datatype> [--tags <tags>]
>> [--adaptor-params <adaptor specific params>|--adaptor-config-file
>> <file>]
>> <initial offset>
>>
>> The '--*' flags could be reserved. This would allow us to keep with a
>> one-line syntax where that approach works, but allow for expansion.
>> Also, if an adaptor config got to complex, those configs could be
>> specified in a file if needed.
>>
>>
>> On Mon, Sep 20, 2010 at 9:52 PM, Jerome Boulon <[EMAIL PROTECTED]> wrote:
>>> Hi,
>>> If I had to implement this, I will add an extra parameter
>>> (?extraParams=xyz).
>>> The adaptorImp will be the only one responsible for parsing this adaptor’s
>>> specific info.
>>> I don’t think that we could/should add new complexity in the parsing.
>>> The same think should be done for getCurrentStatus(), a public result, that
>>> is the same for all adaptors in order to know if the adaptors is working or
>>> not and a private section that will give extra information.
>>>
>>> Also, moving to a json input should simplify everything.
>>> /Jerome.
>>>
>>> On 9/20/10 5:15 PM, "Bill Graham" <[EMAIL PROTECTED]> wrote:
>>>
>>> I'd like to hear Ari's take on this, but this does feel a bit hacky to
>>> me. Plus, it would put the responsibility of parsing tags on each
>>> adaptor impl and would require a refactor of how each one currently
>>> parses args.
>>>
>>> Actually, we might be able to intercept the call to parseArgs in
>>> AbstractAdaptor and pull out the tags if they exist and pass the rest