Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> HBase random read performance


+
Ankit Jain 2013-04-13, 05:31
+
Ted Yu 2013-04-13, 15:16
+
Adrien Mogenet 2013-04-13, 16:00
+
Harsh J 2013-04-13, 17:02
+
Jean-Marc Spaggiari 2013-04-14, 21:58
+
Anoop Sam John 2013-04-15, 10:17
+
Rishabh Agrawal 2013-04-15, 10:42
+
Ankit Jain 2013-04-15, 10:53
+
谢良 2013-04-15, 11:41
+
Ankit Jain 2013-04-15, 13:04
+
Doug Meil 2013-04-15, 13:21
+
Ted Yu 2013-04-15, 13:30
+
Ted Yu 2013-04-15, 14:13
+
Ted Yu 2013-04-15, 17:03
+
lars hofhansl 2013-04-16, 14:55
+
Liu, Raymond 2013-04-16, 07:49
+
Nicolas Liochon 2013-04-16, 08:22
Copy link to this message
-
Re: 答复: HBase random read performance
Hi Nicolas,

I think it might be good to create a JIRA for that anyway since seems that
some users are expecting this behaviour.

My 2¢ ;)

JM

2013/4/16 Nicolas Liochon <[EMAIL PROTECTED]>

> I think there is something in the middle that could be done. It was
> discussed here a while ago, but without any JIRA created.  See thread:
>
> http://mail-archives.apache.org/mod_mbox/hbase-user/201302.mbox/%3CCAKxWWm19OC+dePTK60bMmcecv=7tC+[EMAIL PROTECTED]%3E
>
> If someone can spend some time on it, I can create the JIRA...
>
> Nicolas
>
>
> On Tue, Apr 16, 2013 at 9:49 AM, Liu, Raymond <[EMAIL PROTECTED]>
> wrote:
>
> > So what is lacking here? The action should also been parallel inside RS
> > for each region, Instead of just parallel on RS level?
> > Seems this will be rather difficult to implement, and for Get, might not
> > be worthy?
> >
> > >
> > > I looked
> > > at src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
> > > in
> > > 0.94
> > >
> > > In processBatchCallback(), starting line 1538,
> > >
> > >         // step 1: break up into regionserver-sized chunks and build
> the
> > data
> > > structs
> > >         Map<HRegionLocation, MultiAction<R>> actionsByServer > > >           new HashMap<HRegionLocation, MultiAction<R>>();
> > >         for (int i = 0; i < workingList.size(); i++) {
> > >
> > > So we do group individual action by server.
> > >
> > > FYI
> > >
> > > On Mon, Apr 15, 2013 at 6:30 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> > >
> > > > Doug made a good point.
> > > >
> > > > Take a look at the performance gain for parallel scan (bottom chart
> > > > compared to top chart):
> > > >
> https://issues.apache.org/jira/secure/attachment/12578083/FDencode.png
> > > >
> > > > See
> > > >
> > > https://issues.apache.org/jira/browse/HBASE-8316?focusedCommentId=1362
> > >
> 8300&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpan
> > > el#comment-13628300for explanation of the two methods.
> > > >
> > > > Cheers
> > > >
> > > > On Mon, Apr 15, 2013 at 6:21 AM, Doug Meil
> > > <[EMAIL PROTECTED]>wrote:
> > > >
> > > >>
> > > >> Hi there, regarding this...
> > > >>
> > > >> > We are passing random 10000 row-keys as input, while HBase is
> > > >> > taking
> > > >> around
> > > >> > 17 secs to return 10000 records.
> > > >>
> > > >>
> > > >> ….  Given that you are generating 10,000 random keys, your multi-get
> > > >> is very likely hitting all 5 nodes of your cluster.
> > > >>
> > > >>
> > > >> Historically, multi-Get used to first sort the requests by RS and
> > > >> then
> > > >> *serially* go the RS to process the multi-Get.  I'm not sure of the
> > > >> current (0.94.x) behavior if it multi-threads or not.
> > > >>
> > > >> One thing you might want to consider is confirming that client
> > > >> behavior, and if it's not multi-threading then perform a test that
> > > >> does the same RS sorting via...
> > > >>
> > > >>
> > > >>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable
> > > >> .html#
> > > >> getRegionLocation%28byte[<
> http://hbase.apache.org/apidocs/org/apache/
> > > >> hadoop/hbase/client/HTable.html#getRegionLocation%28byte[>
> > > >> ]%29
> > > >>
> > > >> …. and then spin up your own threads (one per target RS) and see
> what
> > > >> happens.
> > > >>
> > > >>
> > > >>
> > > >> On 4/15/13 9:04 AM, "Ankit Jain" <[EMAIL PROTECTED]> wrote:
> > > >>
> > > >> >Hi Liang,
> > > >> >
> > > >> >Thanks Liang for reply..
> > > >> >
> > > >> >Ans1:
> > > >> >I tried by using HFile block size of 32 KB and bloom filter is
> > enabled.
> > > >> >The
> > > >> >random read performance is 10000 records in 23 secs.
> > > >> >
> > > >> >Ans2:
> > > >> >We are retrieving all the 10000 rows in one call.
> > > >> >
> > > >> >Ans3:
> > > >> >Disk detai:
> > > >> >Model Number:       ST2000DM001-1CH164
> > > >> >Serial Number:      Z1E276YF
> > > >> >
> > > >> >Please suggest some more optimization
> > > >> >
> > > >> >Thanks,
> > > >> >Ankit Jain
+
Michel Segel 2013-04-17, 12:33
+
Håvard Wahl Kongsgård 2013-04-14, 22:19
+
Mohammad Tariq 2013-04-14, 22:39
+
Ted Yu 2013-07-08, 12:49
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB