I'm running a program which in the streaming layer automatically multithreads and does so by automatically detecting the number of cores on the machine. I realize this model is somewhat in conflict with Hadoop, but nonetheless, that's what I'm doing. Thus, for even resource utilization, it would be nice to not only assign one mapper per core, but only one mapper per machine. I realize that if I saturate the cluster none of this really matters, but consider the following example for clarity: 4-core nodes, 10-node cluster, thus 40 slots, fully configured across mappers and reducers (40 slots of each). Say I run this program with just two mappers. It would run much more efficiently (in essentially half the time) if I could force the two mappers to go to slots on two separate machines instead of running the risk that Hadoop may assign them both to the same machine.
Can this be done?
________________________________________________________________________________ Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com
"Yet mark his perfect self-contentment, and hence learn his lesson, that to be self-contented is to be vile and ignorant, and that to aspire is better than to be blindly and impotently happy." -- Edwin A. Abbott, Flatland ________________________________________________________________________________
-in theory this should work- Find the part of hadoop code that calculates the number of cores and patch it to always return one. [?] On Wed, Jan 29, 2014 at 3:41 AM, Keith Wiley <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation projects and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext