Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Chukwa >> mail # dev >> Cluster-specific Adaptors


Copy link to this message
-
Re: Cluster-specific Adaptors
+1 on staying stateless.

I think the challenge we're facing is that we're trying to support a
syntax that is simple and readable and can be done with a single line
(i.e. for the initial_adaptors file, the telnet API, the command line,
etc), but the configs can potentially be not-so-simple.

For example, here's how you might configure the JMS adaptor which used
dependency injection. That's a lot for a single line and there's
nowhere to add new global configs in front of the adaptor specific
configs without breaking things.

add jms.JMSAdaptor jms-events
failover:(tcp://jms-host.foo.com:61616,tcp://jms-host.foo.com:61616)
-q some.queue.name -s "id_type IN ('162')" -x
org.apache.hadoop.chukwa.datacollection.adaptor.
jms.JMSMessagePropertyTransformer -p
"event_time,id_type,id,srcurl,xref,xrq,title -r event_time,id_type,id"
0

What if we were to adopt a few flags into the syntax:

add [name =] <adaptor_class_name> <datatype> [--tags <tags>]
[--adaptor-params <adaptor specific params>|--adaptor-config-file
<file>]
<initial offset>

The '--*' flags could be reserved. This would allow us to keep with a
one-line syntax where that approach works, but allow for expansion.
Also, if an adaptor config got to complex, those configs could be
specified in a file if needed.
On Mon, Sep 20, 2010 at 9:52 PM, Jerome Boulon <[EMAIL PROTECTED]> wrote:
> Hi,
> If I had to implement this, I will add an extra parameter
> (?extraParams=xyz).
> The adaptorImp will be the only one responsible for parsing this adaptor’s
> specific info.
> I don’t think that we could/should add new complexity in the parsing.
> The same think should be done for getCurrentStatus(), a public result, that
> is the same for all adaptors in order to know if the adaptors is working or
> not and a private section that will give extra information.
>
> Also, moving to a json input should simplify everything.
> /Jerome.
>
> On 9/20/10 5:15 PM, "Bill Graham" <[EMAIL PROTECTED]> wrote:
>
> I'd like to hear Ari's take on this, but this does feel a bit hacky to
> me. Plus, it would put the responsibility of parsing tags on each
> adaptor impl and would require a refactor of how each one currently
> parses args.
>
> Actually, we might be able to intercept the call to parseArgs in
> AbstractAdaptor and pull out the tags if they exist and pass the rest
> to the subclass, which would be none the wiser. Not the cleanest, but
> at lease not as intrusive on the adaptor implementations.
>
> Ari, also what about the getCurrentStatus() method? I'd think all the
> impls would somehow need to incorporate tags into that response as
> well, since AFAIR that's what's used to do Adaptor SerDe with the
> checkpoints file.
>
>
> On Mon, Sep 20, 2010 at 3:57 PM, Eric Yang <[EMAIL PROTECTED]> wrote:
>> Hi Bill,
>>
>> This might be hacky but it should be possible to have adaptor specific
>> params to include tags.  Ari, what do you think?
>>
>> Regards,
>> Eric
>>
>> On 9/20/10 2:58 PM, "Bill Graham" <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>
>> In CHUKWA-515 we discussed the possibility being able to add an
>> adaptor bound to a given cluster:
>>
>>
>> https://issues.apache.org/jira/browse/CHUKWA-515?focusedCommentId=12905811&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12905811
>>
>> I can actually see this being useful, especially now that it's easier
>> to add/remove agents with the Adaptor REST API. Looking into the code
>> it doesn't seem like it would be that hard to do, but I want to make
>> sure I'm not overlooking anything.
>>
>> It seems like we could support this with a few small changes:
>>
>> - Add the concept of tags to the Adaptor interface.
>> - AbstractAdator would support a getTags method which would return the
>> union of tags set on the Adaptor and the default tags on the
>> DataFactory.
>> - Internal tag implementations on each would change to store tags in
>> maps, instead of concat'ed strings. This would allow for a "last in
>> wins" type of functionality so tags could be overriden. This assumes