|
|
-
How does performance scale with the size of the data?Steve Lewis 2010-07-01, 05:15
Assume we have a medium size cluster - say 20 nodes and that the cluster is
used for one job and cannot change in size. Assume we are sorting a large data set. As we increase the size of the data sorted say from 100GB to 1000GB to 10000GB does the time for the sort scale as N or as NLogN? I have heard both answers with NLogN coming largely from folks less familiar with hadoop and as N from others with more experience but I am skeptical - has anyone done tests and can contribute real data -- Steven M. Lewis PhD Institute for Systems Biology Seattle WA |