I was working with a small Hadoop cluster while I was developing a new
scheduler, however the cluster was used only for development purposes and
never in production so I am wondering what obstacles are you facing in a
typical day-to-day cluster administration?
We have been discussing with an ad-company (which has their own development
team) about building a platform with hbase, hadoop and maybe some in-memory
database for caching. My part would be to establish a small cluster (~
5nodes) that would satisfy their requirements and to monitor its behavior.
Because of my current job probably I will not be available at their site
for full-time, so I am wondering:
a) What things are taking most of your time in cluster administration?
b) How many hours should I plan to administer the cluster when the
infrastructure and data is ready (probably this will be a long process) ...
c) What tasks besides software updates, schema updates, monitoring,
additional provisioning should I plan ?