Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Optimizing bulk load performance


Copy link to this message
-
Re: Optimizing bulk load performance
Remote calls to a server. Just forget about it ;) Please verify the network
bandwidth between your nodes.
2013/10/24 Harry Waye <[EMAIL PROTECTED]>

> Excuse the ignorance, RCP?
>
>
> On 24 October 2013 22:28, Jean-Marc Spaggiari <[EMAIL PROTECTED]
> >wrote:
>
> > Your nodes are almost 50% idle... Might be something else. Sound it's not
> > your disks nor your CPU... Maybe to many RCPs?
> >
> > Have you investigate on your network side? netperf might be a good help
> for
> > you.
> >
> > JM
> >
> >
> > 2013/10/24 Harry Waye <[EMAIL PROTECTED]>
> >
> > > p.s. I guess this is more turning into a general hadoop issue, but I'll
> > > keep the discussion here seeing that I have an audience, unless there
> are
> > > objections.
> > >
> > >
> > > On 24 October 2013 22:02, Harry Waye <[EMAIL PROTECTED]> wrote:
> > >
> > > > So just a short update, I'll read into it a little more tomorrow.
>  This
> > > is
> > > > from three of the nodes:
> > > > https://gist.github.com/hazzadous/1264af7c674e1b3cf867
> > > >
> > > > The first is the grey guy.  Just glancing at it, it looks to
> fluctuate
> > > > more than the others.  I guess that could suggest that there are some
> > > > issues with reading from the disks.  Interestingly, it's the only one
> > > that
> > > > doesn't have smartd installed, which alerts us on changes for the
> other
> > > > nodes.  I suspect there's probably some mileage in checking its smart
> > > > attributes.  Will do that tomorrow though.
> > > >
> > > > Out of curiosity, how do people normally monitor disk issues?  I'm
> > going
> > > > to set up collectd to push various things from smartctl tomorrow, at
> > the
> > > > moment all we do is receive emails, which is mostly noise about
> problem
> > > > sector counts increasing +1.
> > > >
> > > >
> > > > On 24 October 2013 19:40, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]
> > > >wrote:
> > > >
> > > >> Can you try vmstat 2? 2 is the interval in seconds it will display
> the
> > > >> disk
> > > >> usage. On the extract here, nothing is running. only 8% is used. (1%
> > > disk
> > > >> IO, 6% User, 1% sys)
> > > >>
> > > >> Run it on 2 or 3 different nodes while you are putting the load on
> the
> > > >> cluster. And take a look at the 4 last numbers and see what the
> value
> > of
> > > >> the last one?
> > > >>
> > > >> On the usercpu0 graph, who is the gray guy showing hight?
> > > >>
> > > >> JM
> > > >>
> > > >> 2013/10/24 Harry Waye <[EMAIL PROTECTED]>
> > > >>
> > > >> > Ok I'm running a load job atm, I've add some possibly
> > incomprehensible
> > > >> > coloured lines to the graph: http://goo.gl/cUGCGG
> > > >> >
> > > >> > This is actually with one fewer nodes due to decommissioning to
> > > replace
> > > >> a
> > > >> > disk, hence I guess the reason for one squiggly line showing no
> disk
> > > >> > activity.  I've included only the cpu stats for CPU0 from each
> node.
> > > >>  The
> > > >> > last graph should read "Memory Used".  vmstat from one of the
> nodes:
> > > >> >
> > > >> > procs -----------memory---------- ---swap-- -----io---- -system--
> > > >> > ----cpu----
> > > >> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs
> us
> > > sy
> > > >> id
> > > >> > wa
> > > >> >  6  0      0 392448 524668 43823900    0    0   501  1044    0
>  0
> >  6
> > > >>  1
> > > >> > 91  1
> > > >> >
> > > >> > To me the wait doesn't seem that high.  Job stats are
> > > >> > http://goo.gl/ZYdUKp,  the job setup is
> > > >> > https://gist.github.com/hazzadous/ac57a384f2ab685f07f6
> > > >> >
> > > >> > Does anything jump out at you?
> > > >> >
> > > >> > Cheers
> > > >> > H
> > > >> >
> > > >> >
> > > >> > On 24 October 2013 16:16, Harry Waye <[EMAIL PROTECTED]> wrote:
> > > >> >
> > > >> > > Hi JM
> > > >> > >
> > > >> > > I took a snapshot on the initial run, before the changes:
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://www.evernote.com/shard/s95/sh/b8e1516d-7c49-43f0-8b5f-d16bbdd3fe13/00d7c6cd6dd9fba92d6f00f90fb54fc1/res/4f0e20a2-1ecb-4085-8bc8-b3263c23afb5/screenshot.png
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB