Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Differences in size of data replicated by mirror maker


Copy link to this message
-
Re: Differences in size of data replicated by mirror maker
When you state the numbers, are they the same across instances in the
cluster, meaning that Topic-0 would have 910*5 GB in source cluster and
25*5 GB in target cluster?

Another possibility is that MirrorMaker uses compression on the producer
side, but I would be surprised if the compression rate could be 25/910.

Guozhang
On Thu, Aug 22, 2013 at 3:48 PM, Rajasekar Elango <[EMAIL PROTECTED]>wrote:

> Yes, both source and target clusters have 5 brokers in cluster.
>
> Sent from my iPhone
>
> On Aug 22, 2013, at 6:11 PM, Guozhang Wang <[EMAIL PROTECTED]> wrote:
>
> > Hello Rajasekar,
> >
> > Are the size of the source cluster and target cluster the same?
> >
> > Guozhang
> >
> >
> > On Thu, Aug 22, 2013 at 2:14 PM, Rajasekar Elango <
> [EMAIL PROTECTED]>wrote:
> >
> >> Hi,
> >>
> >> We are using mirrormaker to replicate data between two kafka clusters.
> I am
> >> seeing huge difference in size of log in data dir between the broker in
> >> source cluster vs broker in destination cluster:
> >>
> >> For eg: Size of ~/data/Topic-0/ is about 910 G in source broker, but
> only
> >> its only 25G in destination broker. I see segmented log files (~500 M)
> is
> >> created for about every 2 or 3 mins in source brokers, but I see
> segmented
> >> log files is created for about every 25 mins in destination broker.
> >>
> >> I verified mirrormaker is doing fine using consumer offset checker, not
> >> much lag, offsets are incrementing. I also verified that
> topics/partitions
> >> are not under replicated in both source and target cluster. What is the
> >> reason for this difference in disk usage?
> >>
> >>
> >> --
> >> Thanks,
> >> Raja.
> >
> >
> >
> > --
> > -- Guozhang
>

--
-- Guozhang

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB