Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> HBase 0.94.15: writes stalls periodically even under moderate steady load (AWS EC2)


Copy link to this message
-
Re: HBase 0.94.15: writes stalls periodically even under moderate steady load (AWS EC2)
I need to apologize and clarify this statement…

First, running benchmarks on AWS is ok, if you’re attempting to get a rough idea of how HBase will perform on a certain class of machines and you’re comparing m1.large to m1.xlarge or m3.xlarge … so that you can get a rough scale on sizing.

However, in this thread, you’re talking about trying to figure out why a certain mechanism isn’t working.

You’re trying to track down why writes stall when you’re working in a virtualized environment where not only do you not have control over the machines, but also the network and your storage.

Also when you run the OS on a virtual machine, there are going to be ‘anomalies’ that you can’t explain because the OS is running within a VM and can only report what it sees, and not what could be happening underneath in the VM’s OS.

So you may see a problem, but will never be able to find the cause.
On Jan 17, 2014, at 5:55 AM, Michael Segel <[EMAIL PROTECTED]> wrote:

> Guys,
>
> Trying to benchmark on AWS is a waste of time. You end up chasing ghosts.
> You want to benchmark, you need to isolate your systems to reduce extraneous factors.
>
> You need real hardware, real network in a controlled environment.
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
>> On Jan 16, 2014, at 12:34 PM, "Bryan Beaudreault" <[EMAIL PROTECTED]> wrote:
>>
>> This might be better on the user list? Anyway..
>>
>> How many IPC handlers are you giving?  m1.xlarge is very low cpu.  Not only
>> does it have only 4 cores (more cores allow more concurrent threads with
>> less context switching), but those cores are severely underpowered.  I
>> would recommend at least c1.xlarge, which is only a bit more expensive.  If
>> you happen to be doing heavy GC, with 1-2 compactions running, and with
>> many writes incoming, you are quickly using up quite a bit of CPU.  What is
>> the load and CPU usage, on the 10.38.106.234:50010?
>>
>> Did you see anything about blocking updates in the hbase logs?  How much
>> memstore are you giving?
>>
>>
>>> On Thu, Jan 16, 2014 at 1:17 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote:
>>>
>>> On Wed, Jan 15, 2014 at 5:32 PM,
>>> Vladimir Rodionov <[EMAIL PROTECTED]> wrote:
>>>
>>>> Yes, I am using ephemeral (local) storage. I found that iostat is most of
>>>> the time idle on 3K load with periodic bursts up to 10% iowait.
>>>
>>> Ok, sounds like the problem is higher up the stack.
>>>
>>> I see in later emails on this thread a log snippet that shows an issue with
>>> the WAL writer pipeline, one of the datanodes is slow, sick, or partially
>>> unreachable. If you have uneven point to point ping times among your
>>> cluster instances, or periodic loss, it might still be AWS's fault,
>>> otherwise I wonder why the DFSClient says a datanode is sick.
>>>
>>> --
>>> Best regards,
>>>
>>>  - Andy
>>>
>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>> (via Tom White)
>>>
>