Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Differences in size of data replicated by mirror maker


Copy link to this message
-
Re: Differences in size of data replicated by mirror maker
Rajasekar Elango 2013-08-23, 14:12
Thanks Guazhang, Jun,

Yes we doing gzip compression and that should be reason for difference in
disk usage. I had a typo that the size is actually 91G in source cluster.So
25G/91G ratio makes sense for compression.

Thanks,
Raja.
On Thu, Aug 22, 2013 at 7:00 PM, Guozhang Wang <[EMAIL PROTECTED]> wrote:

> When you state the numbers, are they the same across instances in the
> cluster, meaning that Topic-0 would have 910*5 GB in source cluster and
> 25*5 GB in target cluster?
>
> Another possibility is that MirrorMaker uses compression on the producer
> side, but I would be surprised if the compression rate could be 25/910.
>
> Guozhang
>
>
> On Thu, Aug 22, 2013 at 3:48 PM, Rajasekar Elango <[EMAIL PROTECTED]
> >wrote:
>
> > Yes, both source and target clusters have 5 brokers in cluster.
> >
> > Sent from my iPhone
> >
> > On Aug 22, 2013, at 6:11 PM, Guozhang Wang <[EMAIL PROTECTED]> wrote:
> >
> > > Hello Rajasekar,
> > >
> > > Are the size of the source cluster and target cluster the same?
> > >
> > > Guozhang
> > >
> > >
> > > On Thu, Aug 22, 2013 at 2:14 PM, Rajasekar Elango <
> > [EMAIL PROTECTED]>wrote:
> > >
> > >> Hi,
> > >>
> > >> We are using mirrormaker to replicate data between two kafka clusters.
> > I am
> > >> seeing huge difference in size of log in data dir between the broker
> in
> > >> source cluster vs broker in destination cluster:
> > >>
> > >> For eg: Size of ~/data/Topic-0/ is about 910 G in source broker, but
> > only
> > >> its only 25G in destination broker. I see segmented log files (~500 M)
> > is
> > >> created for about every 2 or 3 mins in source brokers, but I see
> > segmented
> > >> log files is created for about every 25 mins in destination broker.
> > >>
> > >> I verified mirrormaker is doing fine using consumer offset checker,
> not
> > >> much lag, offsets are incrementing. I also verified that
> > topics/partitions
> > >> are not under replicated in both source and target cluster. What is
> the
> > >> reason for this difference in disk usage?
> > >>
> > >>
> > >> --
> > >> Thanks,
> > >> Raja.
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> >
>
>
>
> --
> -- Guozhang
>

--
Thanks,
Raja.