Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # dev - HBase 0.94.15: writes stalls periodically even under moderate steady load (AWS EC2)


+
Vladimir Rodionov 2014-01-15, 20:33
+
Nick Dimiduk 2014-01-15, 22:14
+
Andrew Purtell 2014-01-15, 23:26
+
Andrew Purtell 2014-01-15, 23:27
+
Vladimir Rodionov 2014-01-16, 01:32
+
Andrew Purtell 2014-01-16, 18:17
+
Bryan Beaudreault 2014-01-16, 18:33
+
lars hofhansl 2014-01-16, 20:58
+
Vladimir Rodionov 2014-01-16, 21:08
+
Michael Segel 2014-01-17, 11:55
+
Michael Segel 2014-01-17, 15:03
Copy link to this message
-
Re: HBase 0.94.15: writes stalls periodically even under moderate steady load (AWS EC2)
Nick Dimiduk 2014-01-17, 18:54
You can also grep your logs for entries from JvmPauseMonitor. When you're
trying to track down an anomaly, it can help you locate the ballpark of at
least when it happened. Correlate it back into your Ganglia, OpenTSDB,
CloudWatch, &c systems and look for more info. I think it makes a statement
about "this was probably long GC," but in EC2, it's just as often something
else. Use that as evidence for when you return to your AWS support rep and
argue for a refund.

-n

On Fri, Jan 17, 2014 at 7:03 AM, Michael Segel <[EMAIL PROTECTED]>wrote:

> I need to apologize and clarify this statement…
>
> First, running benchmarks on AWS is ok, if you’re attempting to get a
> rough idea of how HBase will perform on a certain class of machines and
> you’re comparing m1.large to m1.xlarge or m3.xlarge … so that you can get a
> rough scale on sizing.
>
> However, in this thread, you’re talking about trying to figure out why a
> certain mechanism isn’t working.
>
> You’re trying to track down why writes stall when you’re working in a
> virtualized environment where not only do you not have control over the
> machines, but also the network and your storage.
>
> Also when you run the OS on a virtual machine, there are going to be
> ‘anomalies’ that you can’t explain because the OS is running within a VM
> and can only report what it sees, and not what could be happening
> underneath in the VM’s OS.
>
> So you may see a problem, but will never be able to find the cause.
>
>
> On Jan 17, 2014, at 5:55 AM, Michael Segel <[EMAIL PROTECTED]>
> wrote:
>
> > Guys,
> >
> > Trying to benchmark on AWS is a waste of time. You end up chasing ghosts.
> > You want to benchmark, you need to isolate your systems to reduce
> extraneous factors.
> >
> > You need real hardware, real network in a controlled environment.
> >
> >
> > Sent from a remote device. Please excuse any typos...
> >
> > Mike Segel
> >
> >> On Jan 16, 2014, at 12:34 PM, "Bryan Beaudreault" <
> [EMAIL PROTECTED]> wrote:
> >>
> >> This might be better on the user list? Anyway..
> >>
> >> How many IPC handlers are you giving?  m1.xlarge is very low cpu.  Not
> only
> >> does it have only 4 cores (more cores allow more concurrent threads with
> >> less context switching), but those cores are severely underpowered.  I
> >> would recommend at least c1.xlarge, which is only a bit more expensive.
>  If
> >> you happen to be doing heavy GC, with 1-2 compactions running, and with
> >> many writes incoming, you are quickly using up quite a bit of CPU.
>  What is
> >> the load and CPU usage, on the 10.38.106.234:50010?
> >>
> >> Did you see anything about blocking updates in the hbase logs?  How much
> >> memstore are you giving?
> >>
> >>
> >>> On Thu, Jan 16, 2014 at 1:17 PM, Andrew Purtell <[EMAIL PROTECTED]>
> wrote:
> >>>
> >>> On Wed, Jan 15, 2014 at 5:32 PM,
> >>> Vladimir Rodionov <[EMAIL PROTECTED]> wrote:
> >>>
> >>>> Yes, I am using ephemeral (local) storage. I found that iostat is
> most of
> >>>> the time idle on 3K load with periodic bursts up to 10% iowait.
> >>>
> >>> Ok, sounds like the problem is higher up the stack.
> >>>
> >>> I see in later emails on this thread a log snippet that shows an issue
> with
> >>> the WAL writer pipeline, one of the datanodes is slow, sick, or
> partially
> >>> unreachable. If you have uneven point to point ping times among your
> >>> cluster instances, or periodic loss, it might still be AWS's fault,
> >>> otherwise I wonder why the DFSClient says a datanode is sick.
> >>>
> >>> --
> >>> Best regards,
> >>>
> >>>  - Andy
> >>>
> >>> Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> >>> (via Tom White)
> >>>
> >
>
>
+
Ted Yu 2014-01-17, 19:11
+
Andrew Purtell 2014-01-17, 18:21
+
Vladimir Rodionov 2014-01-17, 18:33
+
lars hofhansl 2014-01-16, 05:17
+
Vladimir Rodionov 2014-01-16, 05:49
+
谢良 2014-01-16, 05:55
+
Vladimir Rodionov 2014-01-16, 06:45
+
lars hofhansl 2014-01-16, 07:13
+
谢良 2014-01-16, 08:24
+
Vladimir Rodionov 2014-01-16, 14:46
+
lars hofhansl 2014-01-17, 16:49
+
Vladimir Rodionov 2014-01-15, 22:07