|
|
-
One mapper/reducer runs on a single JVM
Lin Ma 2012-11-06, 01:12
Hello Hadoop experts,
I have a question in my mind for a long time. Supposing I am developing M-R program, and it is Java based (Java UDF, implements mapper or reducer interface). My question is, in this scenario, whether a mapper or a reducer is a separate JVM process? E.g. supposing on a machine, there are 4 mappers, they are 4 individual processes? I am also wondering whether the processes on a single machine will impact each other when each JVM wants to get more memory to run faster?
thanks in advance, Lin
+
Lin Ma 2012-11-06, 01:12
-
Re: One mapper/reducer runs on a single JVM
Michael Segel 2012-11-06, 04:46
Mappers and Reducers are separate JVM processes. And yes you need to take in to account the amount of memory the machine(s) when you configure the number of slots.
If you are running just Hadoop, you could have a little swap. Running HBase, fuggit about it. On Nov 5, 2012, at 7:12 PM, Lin Ma <[EMAIL PROTECTED]> wrote:
> Hello Hadoop experts, > > I have a question in my mind for a long time. Supposing I am developing M-R program, and it is Java based (Java UDF, implements mapper or reducer interface). My question is, in this scenario, whether a mapper or a reducer is a separate JVM process? E.g. supposing on a machine, there are 4 mappers, they are 4 individual processes? I am also wondering whether the processes on a single machine will impact each other when each JVM wants to get more memory to run faster? > > thanks in advance, > Lin > >
+
Michael Segel 2012-11-06, 04:46
-
Re: One mapper/reducer runs on a single JVM
Lin Ma 2012-11-06, 05:06
Thanks Michael,
"If you are running just Hadoop, you could have a little swap. Running HBase, fuggit about it." -- could you give a bit more information about what do you mean swap and why forget for HBase?
regards, Lin On Tue, Nov 6, 2012 at 12:46 PM, Michael Segel <[EMAIL PROTECTED]>wrote:
> Mappers and Reducers are separate JVM processes. > And yes you need to take in to account the amount of memory the machine(s) > when you configure the number of slots. > > If you are running just Hadoop, you could have a little swap. Running > HBase, fuggit about it. > > > On Nov 5, 2012, at 7:12 PM, Lin Ma <[EMAIL PROTECTED]> wrote: > > > Hello Hadoop experts, > > > > I have a question in my mind for a long time. Supposing I am developing > M-R program, and it is Java based (Java UDF, implements mapper or reducer > interface). My question is, in this scenario, whether a mapper or a reducer > is a separate JVM process? E.g. supposing on a machine, there are 4 > mappers, they are 4 individual processes? I am also wondering whether the > processes on a single machine will impact each other when each JVM wants to > get more memory to run faster? > > > > thanks in advance, > > Lin > > > > > >
+
Lin Ma 2012-11-06, 05:06
-
Re: One mapper/reducer runs on a single JVM
Michael Segel 2012-11-06, 16:27
If you exceed the amount of physical memory available, memory pages will be written to disk in a temp space. The act of 'swapping' the memory pages from memory to disk and back again is known as 'swap'.
HBase is highly sensitive to the latency of swapping memory in and out of physical memory to disk. You need to avoid swap when running HBase. It will crash a region server and ultimately you can end up with a cascading failure and HBase will go down.
HTH
-Mike
On Nov 5, 2012, at 11:06 PM, Lin Ma <[EMAIL PROTECTED]> wrote:
> Thanks Michael, > > "If you are running just Hadoop, you could have a little swap. Running HBase, fuggit about it." -- could you give a bit more information about what do you mean swap and why forget for HBase? > > regards, > Lin > > > On Tue, Nov 6, 2012 at 12:46 PM, Michael Segel <[EMAIL PROTECTED]> wrote: > Mappers and Reducers are separate JVM processes. > And yes you need to take in to account the amount of memory the machine(s) when you configure the number of slots. > > If you are running just Hadoop, you could have a little swap. Running HBase, fuggit about it. > > > On Nov 5, 2012, at 7:12 PM, Lin Ma <[EMAIL PROTECTED]> wrote: > > > Hello Hadoop experts, > > > > I have a question in my mind for a long time. Supposing I am developing M-R program, and it is Java based (Java UDF, implements mapper or reducer interface). My question is, in this scenario, whether a mapper or a reducer is a separate JVM process? E.g. supposing on a machine, there are 4 mappers, they are 4 individual processes? I am also wondering whether the processes on a single machine will impact each other when each JVM wants to get more memory to run faster? > > > > thanks in advance, > > Lin > > > > > >
+
Michael Segel 2012-11-06, 16:27
-
Re: One mapper/reducer runs on a single JVM
Lin Ma 2012-11-07, 12:32
Thanks Mike.
1. So I think you mean for Hadoop, since it is batch job latency is not the most key concern, so time spent on swap is acceptable. But for HBase, the normal use case is on-demand and semi-real time query, so we need to avoid the memory swap to impact latency? 2. Supposing I have 4 mappers run as 4 JVMs on one machine. Do each of them share dedicated exclusive physical memory space for heap memory management (which means if one process consuming too much memory which causes swap will NOT impact others)? Or all the JVMs share the same physical memory pool (which means if one process consuming too much memory which causes swap will impact others)? 3. Any best practices to avoid swap in Hadoop and HBase use case?
regards, Lin On Wed, Nov 7, 2012 at 12:27 AM, Michael Segel <[EMAIL PROTECTED]>wrote:
> If you exceed the amount of physical memory available, memory pages will > be written to disk in a temp space. The act of 'swapping' the memory pages > from memory to disk and back again is known as 'swap'. > > HBase is highly sensitive to the latency of swapping memory in and out of > physical memory to disk. You need to avoid swap when running HBase. It > will crash a region server and ultimately you can end up with a cascading > failure and HBase will go down. > > HTH > > -Mike > > On Nov 5, 2012, at 11:06 PM, Lin Ma <[EMAIL PROTECTED]> wrote: > > Thanks Michael, > > "If you are running just Hadoop, you could have a little swap. Running > HBase, fuggit about it." -- could you give a bit more information about > what do you mean swap and why forget for HBase? > > regards, > Lin > > > On Tue, Nov 6, 2012 at 12:46 PM, Michael Segel <[EMAIL PROTECTED]>wrote: > >> Mappers and Reducers are separate JVM processes. >> And yes you need to take in to account the amount of memory the >> machine(s) when you configure the number of slots. >> >> If you are running just Hadoop, you could have a little swap. Running >> HBase, fuggit about it. >> >> >> On Nov 5, 2012, at 7:12 PM, Lin Ma <[EMAIL PROTECTED]> wrote: >> >> > Hello Hadoop experts, >> > >> > I have a question in my mind for a long time. Supposing I am developing >> M-R program, and it is Java based (Java UDF, implements mapper or reducer >> interface). My question is, in this scenario, whether a mapper or a reducer >> is a separate JVM process? E.g. supposing on a machine, there are 4 >> mappers, they are 4 individual processes? I am also wondering whether the >> processes on a single machine will impact each other when each JVM wants to >> get more memory to run faster? >> > >> > thanks in advance, >> > Lin >> > >> > >> >> > >
+
Lin Ma 2012-11-07, 12:32
|
|