Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Differences in size of data replicated by mirror maker


Copy link to this message
-
Re: Differences in size of data replicated by mirror maker
Ah, one thing to be aware of is that the effectiveness of compression is
directly related to the producer batch size--more batching, more
compression. So even if you use compression on both clusters the mirror may
be much smaller.

-jay

On Friday, August 23, 2013, Rajasekar Elango wrote:

> Thanks Guazhang, Jun,
>
> Yes we doing gzip compression and that should be reason for difference in
> disk usage. I had a typo that the size is actually 91G in source cluster.So
> 25G/91G ratio makes sense for compression.
>
> Thanks,
> Raja.
>
>
> On Thu, Aug 22, 2013 at 7:00 PM, Guozhang Wang <[EMAIL PROTECTED]<javascript:;>>
> wrote:
>
> > When you state the numbers, are they the same across instances in the
> > cluster, meaning that Topic-0 would have 910*5 GB in source cluster and
> > 25*5 GB in target cluster?
> >
> > Another possibility is that MirrorMaker uses compression on the producer
> > side, but I would be surprised if the compression rate could be 25/910.
> >
> > Guozhang
> >
> >
> > On Thu, Aug 22, 2013 at 3:48 PM, Rajasekar Elango <
> [EMAIL PROTECTED] <javascript:;>
> > >wrote:
> >
> > > Yes, both source and target clusters have 5 brokers in cluster.
> > >
> > > Sent from my iPhone
> > >
> > > On Aug 22, 2013, at 6:11 PM, Guozhang Wang <[EMAIL PROTECTED]<javascript:;>>
> wrote:
> > >
> > > > Hello Rajasekar,
> > > >
> > > > Are the size of the source cluster and target cluster the same?
> > > >
> > > > Guozhang
> > > >
> > > >
> > > > On Thu, Aug 22, 2013 at 2:14 PM, Rajasekar Elango <
> > > [EMAIL PROTECTED] <javascript:;>>wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> We are using mirrormaker to replicate data between two kafka
> clusters.
> > > I am
> > > >> seeing huge difference in size of log in data dir between the broker
> > in
> > > >> source cluster vs broker in destination cluster:
> > > >>
> > > >> For eg: Size of ~/data/Topic-0/ is about 910 G in source broker, but
> > > only
> > > >> its only 25G in destination broker. I see segmented log files (~500
> M)
> > > is
> > > >> created for about every 2 or 3 mins in source brokers, but I see
> > > segmented
> > > >> log files is created for about every 25 mins in destination broker.
> > > >>
> > > >> I verified mirrormaker is doing fine using consumer offset checker,
> > not
> > > >> much lag, offsets are incrementing. I also verified that
> > > topics/partitions
> > > >> are not under replicated in both source and target cluster. What is
> > the
> > > >> reason for this difference in disk usage?
> > > >>
> > > >>
> > > >> --
> > > >> Thanks,
> > > >> Raja.
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>
>
>
> --
> Thanks,
> Raja.
>