-How does performance scale with the size of the data?
Steve Lewis 2010-07-01, 05:15
Assume we have a medium size cluster - say 20 nodes and that the cluster is
used for one job and cannot change in size.
Assume we are sorting a large data set. As we increase the size of the data
sorted say from 100GB to 1000GB to 10000GB does the
time for the sort scale as N or as NLogN? I have heard both answers with
NLogN coming largely from folks less familiar with hadoop and
as N from others with more experience but I am skeptical - has anyone done
tests and can contribute real data
Steven M. Lewis PhD
Institute for Systems Biology