-Re: Correlation between replication factor and read/write performance survey?
Ted Dunning 2013-02-12, 07:18
The delay due to replication is rarely a large problem in traditional
map-reduce programs since many writes are occurring at once. The real
problem comes because you are consuming 3x the total disk bandwidth so that
the theoretical maximum equilibrium write bandwidth is limited to the
lesser of half your network bandwidth or a third of your usable disk
bandwidth. Usable disk bandwidth for ordinary Hadoop typically can achieve
about half the raw bandwidth of the disks themselves.
On Mon, Feb 11, 2013 at 6:36 PM, Rishi Yadav <[EMAIL PROTECTED]> wrote:
> I think higher replication only makes read easier as client can choose to
> read block from nearest node.
> Writes are done using replication pipeline so client does wait for ack
> from all nodes but writes to only first node. It would be interesting to
> see if there are any benchmarks for delay caused by this acknowledgement.
> Sent from my iPhone
> On Feb 11, 2013, at 6:42 AM, George Kousiouris <[EMAIL PROTECTED]>
> > Hi all,
> > Is anyone aware of any survey/paper/report showing the relationship
> between a replication factor and its penalty/benefit on write/read
> > BR,
> > George
> > --
> > ---------------------------