Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> Any plans on using apache Helix


Copy link to this message
-
Re: Any plans on using apache Helix
Hey Kishore,

I'm really excited about Helix.  It is great to see the toolbox
starting to be filled with such powerful tools.  Some random thoughts
with regards to Helix/Curator/etc.

It seems like we're trying to avoid even supporting a number of things
that the Helix framework provides.  We really want to avoid a master
node.  We hope to avoid the concept of particular nodes holding
specific resources.  (As a query engine, we don't currently have the
concept of things like regions.) We're trying to build upon Berkeley's
Sparrow work and avoid the concept of centralized scheduling.  The
driving node for a particular query is the only entity responsible for
pushing a query to completion and has direct RPC interaction with its
'children'.

Our current use of zookeeper is strictly for the purpose of service
registration and membership information.  If you want to see the (lack
of) complexity of our use right now, you can look here:
https://github.com/apache/incubator-drill/tree/execwork/sandbox/prototype/exec/java-exec/src/main/java/org/apache/drill/exec/coord

Thoughts?

Jacques

On Sun, Apr 21, 2013 at 2:05 PM, kishore g <[EMAIL PROTECTED]> wrote:
> Thanks Ted for making a case. I am pretty sure there were valid points.
>
> I did not get the zero-conf option, is the case that Helix needs to be run
> as a separate service. Helix can be used in both modes as a service and
> also a library. We have deployed it in both modes and we have seen the need
> for it within LinkedIn.
>
> It would be really great if I can get the actual requirements and do
> another pass evaluating.
>
> Thanks and appreciate your time in answering my questions.
>
> Thanks,
> Kishore G
>
>
> On Sun, Apr 21, 2013 at 10:35 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>
>> Kishore,
>>
>> I made the case for Helix and the group seems to have strongly gravitated
>> to the lower level that Curator provides.
>>
>> One feature that would have improved the case for Helix would have been
>> viable zero-conf operation as an option.
>>
>> The game isn't over, however, and if you would like to get involved here on
>> Drill, it might help to have another point of view.
>>
>>
>>
>>
>> On Sun, Apr 21, 2013 at 9:08 AM, kishore g <[EMAIL PROTECTED]> wrote:
>>
>> > Hi Michael,
>> >
>> > Thanks for the update. Here are my thoughts, though cant resist telling
>> > good things about Helix since I am the author :-).
>> >
>> > Here is how I see zk v/s curator v/s helix.
>> >
>> > Zk is amazing for co-ordination and maintaining cluster data like
>> > configuration, etc. It provides the concept of ephemeral which can be
>> used
>> > for liveness detection of a process. However there are lot of corner
>> cases
>> > that is non trivial to code. Curator is a library that makes it easy to
>> use
>> > those apis, it provides the recipes in terms of leader election, barrier,
>> > etc. Helix provides a much higher abstraction where it treats various
>> > components of a distributed system as first class citizens and allows
>> > system builders to think in terms of nodes, resources, partitions, state
>> > machine etc. Helix underneath uses zkclient(something like curator) to
>> make
>> > it easy to interact with zookeeper. We had plans to use curator but Helix
>> > needed really good performance in terms of start up/fail over time and
>> when
>> > we have 1000's of partitions. We had to use low level apis of zk to
>> achieve
>> > that.
>> >
>> > From my experience, while building distributed systems cluster management
>> > starts out very simple and one will be able to do a prototype very
>> quickly.
>> > But over time, things get complicated and need many more features. At
>> > LinkedIn we started in a similar way where we simply used some ephemeral
>> > nodes to know whether we have a lock or not. But over time, lot of things
>> > like controlling the assignment from outside, evenly distributing locks,
>> > hand over of locks gracefully, restricting which nodes can own a
>> partition,