Suppose that I have a large archive in HDFS, say, containing 500 files and 4GB. I want to make this available via YARN LocalResource. The archive doesn't change very often (maybe once per month). Will YARN optimize for this? Does the expanded per-node cache persist across application runs (using something like modification time to know if re-expansion is needed)?
If the archive is re-expanded on each node every time the app is launched, should I set the replication factor higher to reduce rack bandwidth?