1. So I think you mean for Hadoop, since it is batch job latency is not the
most key concern, so time spent on swap is acceptable. But for HBase, the
normal use case is on-demand and semi-real time query, so we need to avoid
the memory swap to impact latency?
2. Supposing I have 4 mappers run as 4 JVMs on one machine. Do each of them
share dedicated exclusive physical memory space for heap memory management
(which means if one process consuming too much memory which causes swap
will NOT impact others)? Or all the JVMs share the same physical memory
pool (which means if one process consuming too much memory which causes
swap will impact others)?
3. Any best practices to avoid swap in Hadoop and HBase use case?
On Wed, Nov 7, 2012 at 12:27 AM, Michael Segel <[EMAIL PROTECTED]>wrote:
> If you exceed the amount of physical memory available, memory pages will
> be written to disk in a temp space. The act of 'swapping' the memory pages
> from memory to disk and back again is known as 'swap'.
> HBase is highly sensitive to the latency of swapping memory in and out of
> physical memory to disk. You need to avoid swap when running HBase. It
> will crash a region server and ultimately you can end up with a cascading
> failure and HBase will go down.
> On Nov 5, 2012, at 11:06 PM, Lin Ma <[EMAIL PROTECTED]> wrote:
> Thanks Michael,
> "If you are running just Hadoop, you could have a little swap. Running
> HBase, fuggit about it." -- could you give a bit more information about
> what do you mean swap and why forget for HBase?
> On Tue, Nov 6, 2012 at 12:46 PM, Michael Segel <[EMAIL PROTECTED]>wrote:
>> Mappers and Reducers are separate JVM processes.
>> And yes you need to take in to account the amount of memory the
>> machine(s) when you configure the number of slots.
>> If you are running just Hadoop, you could have a little swap. Running
>> HBase, fuggit about it.
>> On Nov 5, 2012, at 7:12 PM, Lin Ma <[EMAIL PROTECTED]> wrote:
>> > Hello Hadoop experts,
>> > I have a question in my mind for a long time. Supposing I am developing
>> M-R program, and it is Java based (Java UDF, implements mapper or reducer
>> interface). My question is, in this scenario, whether a mapper or a reducer
>> is a separate JVM process? E.g. supposing on a machine, there are 4
>> mappers, they are 4 individual processes? I am also wondering whether the
>> processes on a single machine will impact each other when each JVM wants to
>> get more memory to run faster?
>> > thanks in advance,
>> > Lin