Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Kafka/ZK Cluster Example


Copy link to this message
-
Re: Kafka/ZK Cluster Example
We've been thinking about this stuff a lot recently, at work.

We've had some HD failures in our Kafka cluster. I don't know all the
details, but from what I heard, the HDs were mirrored in RAID but several
of them failed in a close time interval and the array did not have time to
fully rebuild itself, so we lost all of that data from the Kafka cluster.
Thankfully, the data was being consumed in near real time, so we only
really lost a small unconsumed window of data.

Now, we're wondering what we could improve to prevent this scenario in the
future. I investigated Kafka mirroring but since it relies on consuming
data, the probability to lose the unconsumed window is still there. If we
had consumers that were more batch oriented (like hadoop) rather than
real-time, the benefits of a mirrored Kafka cluster would be greater, but
for our use cases, where data is consumed near real-time, we would still
lose as much data as before. Am I right?

KAFKA-50, with sync replication would have solved our problem, but until
that's done, what are our options?

I came to the conclusion that simply adding more mirrored copies in our
RAID arrays would be the most cost-effective way to give us both more
availability and more redundancy. This doesn't deal with the scenario where
a machine fails and becomes unavailable, in which case the data on it would
be temporarily unavailable but not lost (although, again, there could be a
small window of uncommited data). However, in terms of protection against
data loss from HD failures, it seems like the best option for now, no?

It doesn't feel right to just throw more hardware at problems hehe... but I
guess sometimes it's the only choice :) ...

Please tell me if that makes sense!

--
Felix

On Wed, Jan 11, 2012 at 6:16 PM, Felix GV <[EMAIL PROTECTED]> wrote:

> As I understand it, you cannot use a mirrored Kafka cluster as a hot
> fail-over.
>
> You could probably use it as a manual fail-over, but I don't know the
> complexity involved in doing that.
>
> Also, if your source cluster fails while producers were putting data into
> it, there will be an "unconsumed window" of data that is lost. This
> corresponds to the data that the embedded consumer in the mirrored cluster
> did not have time to consume from the source cluster.
>
> All in all, the mirrored cluster is akin to asynchronous replication,
> without any hot fail-over capability. Thus, it provides data redundancy
> (outside of the unconsumed window described above) but no extra
> availability (unless you count manual interventions).
>
> KAFKA-50 <https://issues.apache.org/jira/browse/KAFKA-50>, on the other
> hand, will provide both asynchronous AND synchronous replication (although
> the latter will incur a latency penalty) and will be able to use the
> replicas (data redundancy) as hot-fail overs.
>
> Depending on your personal definition of "highly reliable" (whether it
> includes data redundancy and/or availability), I think that should probably
> answer your question...?
>
> To all the Kafka experts: please correct me if the above explanations are
> incorrect :) !
>
> --
> Felix
>
>
>
>
> On Wed, Jan 11, 2012 at 5:53 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
>> It's just that the mirroring logic depends on ZK to be available most of
>> the time.
>>
>> Jun
>>
>> On Wed, Jan 11, 2012 at 2:35 PM, Christian Carollo <[EMAIL PROTECTED]
>> >wrote:
>>
>> > I see.  But if I used that configuration and then did the mirroring you
>> > suggested would that be enough, in your opinion, to be considered highly
>> > reliable?
>> >
>> > Christian
>> >
>> >
>> > On Jan 11, 2012, at 2:32 PM, Jun Rao wrote:
>> >
>> > >> For example, can I have one ZK instance and one broker on one machine
>> > and
>> > > that is enough to define a ZK cluster and a Kafka Cluster?
>> > >
>> > > Yes, although you don't get the reliability of ZK now.
>> > >
>> > > Jun
>> > >
>> > >
>> > > On Wed, Jan 11, 2012 at 2:06 PM, Christian Carollo <
>> [EMAIL PROTECTED]
>> > >wrote:
>