Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Chukwa >> mail # dev >> Cluster-specific Adaptors


+
Bill Graham 2010-09-20, 21:58
+
Eric Yang 2010-09-20, 22:57
+
Bill Graham 2010-09-21, 00:15
+
Ariel Rabkin 2010-09-21, 00:32
+
Eric Yang 2010-09-21, 01:51
+
Jerome Boulon 2010-09-21, 04:52
+
Bill Graham 2010-09-21, 18:17
Copy link to this message
-
Re: Cluster-specific Adaptors
Why is it bad to keep some state per command session?  It's easy to
know when the state can be discarded -- as soon as we're done reading
the file or when the socket closes.

I think it's fairly intuitive and readable; it's routine for e.g.,
scripts to modify interpreter state.  In straight-line code, which is
all we're ever going to have in the control protocol, this is very
easy to reason about.

--Ari

On Tue, Sep 21, 2010 at 11:17 AM, Bill Graham <[EMAIL PROTECTED]> wrote:
> +1 on staying stateless.
>
> I think the challenge we're facing is that we're trying to support a
> syntax that is simple and readable and can be done with a single line
> (i.e. for the initial_adaptors file, the telnet API, the command line,
> etc), but the configs can potentially be not-so-simple.
>
> For example, here's how you might configure the JMS adaptor which used
> dependency injection. That's a lot for a single line and there's
> nowhere to add new global configs in front of the adaptor specific
> configs without breaking things.
>
> add jms.JMSAdaptor jms-events
> failover:(tcp://jms-host.foo.com:61616,tcp://jms-host.foo.com:61616)
> -q some.queue.name -s "id_type IN ('162')" -x
> org.apache.hadoop.chukwa.datacollection.adaptor.
> jms.JMSMessagePropertyTransformer -p
> "event_time,id_type,id,srcurl,xref,xrq,title -r event_time,id_type,id"
> 0
>
> What if we were to adopt a few flags into the syntax:
>
> add [name =] <adaptor_class_name> <datatype> [--tags <tags>]
> [--adaptor-params <adaptor specific params>|--adaptor-config-file
> <file>]
> <initial offset>
>
> The '--*' flags could be reserved. This would allow us to keep with a
> one-line syntax where that approach works, but allow for expansion.
> Also, if an adaptor config got to complex, those configs could be
> specified in a file if needed.
>
>
> On Mon, Sep 20, 2010 at 9:52 PM, Jerome Boulon <[EMAIL PROTECTED]> wrote:
>> Hi,
>> If I had to implement this, I will add an extra parameter
>> (?extraParams=xyz).
>> The adaptorImp will be the only one responsible for parsing this adaptor’s
>> specific info.
>> I don’t think that we could/should add new complexity in the parsing.
>> The same think should be done for getCurrentStatus(), a public result, that
>> is the same for all adaptors in order to know if the adaptors is working or
>> not and a private section that will give extra information.
>>
>> Also, moving to a json input should simplify everything.
>> /Jerome.
>>
>> On 9/20/10 5:15 PM, "Bill Graham" <[EMAIL PROTECTED]> wrote:
>>
>> I'd like to hear Ari's take on this, but this does feel a bit hacky to
>> me. Plus, it would put the responsibility of parsing tags on each
>> adaptor impl and would require a refactor of how each one currently
>> parses args.
>>
>> Actually, we might be able to intercept the call to parseArgs in
>> AbstractAdaptor and pull out the tags if they exist and pass the rest
>> to the subclass, which would be none the wiser. Not the cleanest, but
>> at lease not as intrusive on the adaptor implementations.
>>
>> Ari, also what about the getCurrentStatus() method? I'd think all the
>> impls would somehow need to incorporate tags into that response as
>> well, since AFAIR that's what's used to do Adaptor SerDe with the
>> checkpoints file.
>>
>>
>> On Mon, Sep 20, 2010 at 3:57 PM, Eric Yang <[EMAIL PROTECTED]> wrote:
>>> Hi Bill,
>>>
>>> This might be hacky but it should be possible to have adaptor specific
>>> params to include tags.  Ari, what do you think?
>>>
>>> Regards,
>>> Eric
>>>
>>> On 9/20/10 2:58 PM, "Bill Graham" <[EMAIL PROTECTED]> wrote:
>>>
>>> Hi,
>>>
>>> In CHUKWA-515 we discussed the possibility being able to add an
>>> adaptor bound to a given cluster:
>>>
>>>
>>> https://issues.apache.org/jira/browse/CHUKWA-515?focusedCommentId=12905811&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12905811
>>>
>>> I can actually see this being useful, especially now that it's easier
>>> to add/remove agents with the Adaptor REST API. Looking into the code

Ari Rabkin [EMAIL PROTECTED]
UC Berkeley Computer Science Department
+
Bill Graham 2010-09-21, 19:55
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB