-access patterns investigation to dynamically toggle the replication factor in a hadoop cluster
As part of the research for an ongoing project, we are interested in
investigating the ability to predict data access patterns on a hadoop
cluster. The purpose is to study the file access patterns (in a time
series manner), so that proactive manipulation of data may be achieved.
This for example may involve the increase/decrease of the replication
factor in an Apache Hadoop cluster (and according HDFS) to deal with an
upcoming predicted increase/decrease of data accesses.
So we would like your advise on some issues:
1) is this the correct mailing list? :)
2) would a changed replication factor translate to a better performance
of a MR job (either by experience you may have or if you have in mind a
report/paper etc. that has studied this)
3) do you find this interesting in general and something we should pursue?
4) are you aware of any related work on the topic we could use as a
Thanks for your help,