Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Distributed cache: how big is too big?


Copy link to this message
-
Distributed cache: how big is too big?
I am researching a Hadoop solution for an existing application that requires a directory structure full of data for processing.
To make the Hadoop solution work I need to deploy the data directory to each DN when the job is executed.I know this isn't new and commonly done with a Distributed Cache.
Based on experience what are the common file sizes deployed in a Distributed Cache? I know smaller is better, but how big is too big? the larger cache deployed I have read there will be startup latency. I also assume there are other factors that play into this.
I know that->Default local.cache.size=10Gb
-Range of desirable sizes for Distributed Cache= 10Kb - 1Gb??-Distributed Cache is normally not used if larger than =____?
Another Option: Put the data directories on each DN and provide location to TaskTracker?
thanksJohn
     
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB