It's possible isolation job submission for hadoop cluster, we currently
running 48 machine cluster. we monitor Hadoop is not provides efficient
resource isolation. In my case we ran for tech and research pool, When tech
job some memory leak will haven, It's occupy the hole cluster. Finally
we figure out issue with tech job. It's screwed up hole hadoop cluster.
finally 10 data node are dead.
Any prevention of job submission efficient way resource allocation. When
something wrong in particular job, effect particular pool, Not effect
others job. Any way to archive this
Please guide me guys.
My idea is, When tech user submit job means only apply job in for my
case submit 24 machine. other machine only for research user.
It's will prevent the memory leak problem.
Did I learn something today? If not, I wasted it.