Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> HBase 0.94.15: writes stalls periodically even under moderate steady load (AWS EC2)


Copy link to this message
-
Re: HBase 0.94.15: writes stalls periodically even under moderate steady load (AWS EC2)
I generally agree. However, the "High I/O" and "Cluster Compute" types are
HVM single tenant on the server, and the IO stack uses SR-IOV so MMIO and
interrupts are directly going to the VM, and 10GE network paths with no
traffic but your own. The locally attached storage is SSD. This is pretty
close to what you'll have in your own data center or a colo. And damn
expensive, but good if you can afford it.
On Fri, Jan 17, 2014 at 7:03 AM, Michael Segel <[EMAIL PROTECTED]>wrote:

> I need to apologize and clarify this statement…
>
> First, running benchmarks on AWS is ok, if you’re attempting to get a
> rough idea of how HBase will perform on a certain class of machines and
> you’re comparing m1.large to m1.xlarge or m3.xlarge … so that you can get a
> rough scale on sizing.
>
> However, in this thread, you’re talking about trying to figure out why a
> certain mechanism isn’t working.
>
> You’re trying to track down why writes stall when you’re working in a
> virtualized environment where not only do you not have control over the
> machines, but also the network and your storage.
>
> Also when you run the OS on a virtual machine, there are going to be
> ‘anomalies’ that you can’t explain because the OS is running within a VM
> and can only report what it sees, and not what could be happening
> underneath in the VM’s OS.
>
> So you may see a problem, but will never be able to find the cause.
>
>
> On Jan 17, 2014, at 5:55 AM, Michael Segel <[EMAIL PROTECTED]>
> wrote:
>
> > Guys,
> >
> > Trying to benchmark on AWS is a waste of time. You end up chasing ghosts.
> > You want to benchmark, you need to isolate your systems to reduce
> extraneous factors.
> >
> > You need real hardware, real network in a controlled environment.
> >
> >
> > Sent from a remote device. Please excuse any typos...
> >
> > Mike Segel
> >
> >> On Jan 16, 2014, at 12:34 PM, "Bryan Beaudreault" <
> [EMAIL PROTECTED]> wrote:
> >>
> >> This might be better on the user list? Anyway..
> >>
> >> How many IPC handlers are you giving?  m1.xlarge is very low cpu.  Not
> only
> >> does it have only 4 cores (more cores allow more concurrent threads with
> >> less context switching), but those cores are severely underpowered.  I
> >> would recommend at least c1.xlarge, which is only a bit more expensive.
>  If
> >> you happen to be doing heavy GC, with 1-2 compactions running, and with
> >> many writes incoming, you are quickly using up quite a bit of CPU.
>  What is
> >> the load and CPU usage, on the 10.38.106.234:50010?
> >>
> >> Did you see anything about blocking updates in the hbase logs?  How much
> >> memstore are you giving?
> >>
> >>
> >>> On Thu, Jan 16, 2014 at 1:17 PM, Andrew Purtell <[EMAIL PROTECTED]>
> wrote:
> >>>
> >>> On Wed, Jan 15, 2014 at 5:32 PM,
> >>> Vladimir Rodionov <[EMAIL PROTECTED]> wrote:
> >>>
> >>>> Yes, I am using ephemeral (local) storage. I found that iostat is
> most of
> >>>> the time idle on 3K load with periodic bursts up to 10% iowait.
> >>>
> >>> Ok, sounds like the problem is higher up the stack.
> >>>
> >>> I see in later emails on this thread a log snippet that shows an issue
> with
> >>> the WAL writer pipeline, one of the datanodes is slow, sick, or
> partially
> >>> unreachable. If you have uneven point to point ping times among your
> >>> cluster instances, or periodic loss, it might still be AWS's fault,
> >>> otherwise I wonder why the DFSClient says a datanode is sick.
> >>>
> >>> --
> >>> Best regards,
> >>>
> >>>  - Andy
> >>>
> >>> Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> >>> (via Tom White)
> >>>
> >
>
>
--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)