Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> HBase Thrift inserts bottlenecked somewhere -- but where?


+
Dan Crosta 2013-03-01, 12:17
+
Asaf Mesika 2013-03-01, 14:13
+
Dan Crosta 2013-03-01, 14:17
+
Jean-Daniel Cryans 2013-03-01, 17:33
+
Varun Sharma 2013-03-01, 18:46
+
Varun Sharma 2013-03-01, 18:46
+
Dan Crosta 2013-03-01, 18:49
+
Ted Yu 2013-03-01, 18:52
+
Varun Sharma 2013-03-01, 19:01
+
Ted Yu 2013-03-02, 03:53
+
Dan Crosta 2013-03-02, 17:15
+
lars hofhansl 2013-03-02, 03:42
+
Dan Crosta 2013-03-02, 17:12
+
lars hofhansl 2013-03-02, 17:38
+
Dan Crosta 2013-03-02, 18:47
+
Asaf Mesika 2013-03-02, 19:56
+
Ted Yu 2013-03-02, 20:02
+
lars hofhansl 2013-03-02, 20:50
+
lars hofhansl 2013-03-02, 20:50
+
Dan Crosta 2013-03-02, 22:29
Copy link to this message
-
Re: HBase Thrift inserts bottlenecked somewhere -- but where?
What is the size of your writes ?

On Sat, Mar 2, 2013 at 2:29 PM, Dan Crosta <[EMAIL PROTECTED]> wrote:

> Hm. This could be part of the problem in our case. Unfortunately we don't
> have very good control over which rowkeys will come from which workers
> (we're not using map-reduce or anything like it where we have that sort of
> control, at least not without some changes). But this is valuable
> information for future developments, thanks for mentioning it.
>
> On Mar 2, 2013, at 2:56 PM, Asaf Mesika wrote:
>
> > Make sure you are not sending a lot of put of the same rowkey. This can
> > cause contention in the region server side. We fixed that in our project
> by
> > aggregating all the columns for the same rowkey into the same Put object
> > thus when sending List of Put we made sure each Put has a unique rowkey.
> >
> > On Saturday, March 2, 2013, Dan Crosta wrote:
> >
> >> On Mar 2, 2013, at 12:38 PM, lars hofhansl wrote:
> >>> "That's only true from the HDFS perspective, right? Any given region is
> >>> "owned" by 1 of the 6 regionservers at any given time, and writes are
> >>> buffered to memory before being persisted to HDFS, right?"
> >>>
> >>> Only if you disabled the WAL, otherwise each change is written to the
> >> WAL first, and then committed to the memstore.
> >>> So in the sense it's even worse. Each edit is written twice to the FS,
> >> replicated 3 times, and all that only 6 data nodes.
> >>
> >> Are these writes synchronized somehow? Could there be a locking problem
> >> somewhere that wouldn't show up as utilization of disk or cpu?
> >>
> >> What is the upshot of disabling WAL -- I assume it means that if a
> >> RegionServer crashes, you lose any writes that it has in memory but not
> >> committed to HFiles?
> >>
> >>
> >>> 20k writes does seem a bit low.
> >>
> >> I adjusted dfs.datanode.handler.count from 3 to 10 and now we're up to
> >> about 22-23k writes per second, but still no apparent contention for
> any of
> >> the basic system resources.
> >>
> >> Any other suggestions on things to try?
> >>
> >> Thanks,
> >> - Dan
>
>
+
Dan Crosta 2013-03-03, 13:53
+
lars hofhansl 2013-03-02, 20:56
+
Andrew Purtell 2013-03-05, 07:04