Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> How does performance scale with the size of the data?


Copy link to this message
-
How does performance scale with the size of the data?
Assume we have a medium size cluster - say 20 nodes and that the cluster is
used for one job and cannot change in size.
Assume we are sorting a large data set. As we increase the size of the data
sorted say from 100GB to 1000GB to 10000GB does the
time for the sort scale as N or as NLogN? I have heard both answers with
NLogN coming largely from folks less familiar with hadoop and
as N from others with more experience but I am skeptical - has anyone done
tests and can contribute real data

--
Steven M. Lewis PhD
Institute for Systems Biology
Seattle WA
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB