Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - HBase random read performance


+
Ankit Jain 2013-04-13, 05:31
+
Ted Yu 2013-04-13, 15:16
+
Adrien Mogenet 2013-04-13, 16:00
+
Harsh J 2013-04-13, 17:02
+
Jean-Marc Spaggiari 2013-04-14, 21:58
+
Anoop Sam John 2013-04-15, 10:17
+
Rishabh Agrawal 2013-04-15, 10:42
+
Ankit Jain 2013-04-15, 10:53
+
谢良 2013-04-15, 11:41
+
Ankit Jain 2013-04-15, 13:04
+
Doug Meil 2013-04-15, 13:21
+
Ted Yu 2013-04-15, 13:30
+
Ted Yu 2013-04-15, 14:13
+
Ted Yu 2013-04-15, 17:03
+
lars hofhansl 2013-04-16, 14:55
+
Liu, Raymond 2013-04-16, 07:49
Copy link to this message
-
Re: 答复: HBase random read performance
Nicolas Liochon 2013-04-16, 08:22
I think there is something in the middle that could be done. It was
discussed here a while ago, but without any JIRA created.  See thread:
http://mail-archives.apache.org/mod_mbox/hbase-user/201302.mbox/%3CCAKxWWm19OC+dePTK60bMmcecv=7tC+[EMAIL PROTECTED]%3E

If someone can spend some time on it, I can create the JIRA...

Nicolas
On Tue, Apr 16, 2013 at 9:49 AM, Liu, Raymond <[EMAIL PROTECTED]> wrote:

> So what is lacking here? The action should also been parallel inside RS
> for each region, Instead of just parallel on RS level?
> Seems this will be rather difficult to implement, and for Get, might not
> be worthy?
>
> >
> > I looked
> > at src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
> > in
> > 0.94
> >
> > In processBatchCallback(), starting line 1538,
> >
> >         // step 1: break up into regionserver-sized chunks and build the
> data
> > structs
> >         Map<HRegionLocation, MultiAction<R>> actionsByServer > >           new HashMap<HRegionLocation, MultiAction<R>>();
> >         for (int i = 0; i < workingList.size(); i++) {
> >
> > So we do group individual action by server.
> >
> > FYI
> >
> > On Mon, Apr 15, 2013 at 6:30 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> > > Doug made a good point.
> > >
> > > Take a look at the performance gain for parallel scan (bottom chart
> > > compared to top chart):
> > > https://issues.apache.org/jira/secure/attachment/12578083/FDencode.png
> > >
> > > See
> > >
> > https://issues.apache.org/jira/browse/HBASE-8316?focusedCommentId=1362
> > 8300&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpan
> > el#comment-13628300for explanation of the two methods.
> > >
> > > Cheers
> > >
> > > On Mon, Apr 15, 2013 at 6:21 AM, Doug Meil
> > <[EMAIL PROTECTED]>wrote:
> > >
> > >>
> > >> Hi there, regarding this...
> > >>
> > >> > We are passing random 10000 row-keys as input, while HBase is
> > >> > taking
> > >> around
> > >> > 17 secs to return 10000 records.
> > >>
> > >>
> > >> ….  Given that you are generating 10,000 random keys, your multi-get
> > >> is very likely hitting all 5 nodes of your cluster.
> > >>
> > >>
> > >> Historically, multi-Get used to first sort the requests by RS and
> > >> then
> > >> *serially* go the RS to process the multi-Get.  I'm not sure of the
> > >> current (0.94.x) behavior if it multi-threads or not.
> > >>
> > >> One thing you might want to consider is confirming that client
> > >> behavior, and if it's not multi-threading then perform a test that
> > >> does the same RS sorting via...
> > >>
> > >>
> > >> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable
> > >> .html#
> > >> getRegionLocation%28byte[<http://hbase.apache.org/apidocs/org/apache/
> > >> hadoop/hbase/client/HTable.html#getRegionLocation%28byte[>
> > >> ]%29
> > >>
> > >> …. and then spin up your own threads (one per target RS) and see what
> > >> happens.
> > >>
> > >>
> > >>
> > >> On 4/15/13 9:04 AM, "Ankit Jain" <[EMAIL PROTECTED]> wrote:
> > >>
> > >> >Hi Liang,
> > >> >
> > >> >Thanks Liang for reply..
> > >> >
> > >> >Ans1:
> > >> >I tried by using HFile block size of 32 KB and bloom filter is
> enabled.
> > >> >The
> > >> >random read performance is 10000 records in 23 secs.
> > >> >
> > >> >Ans2:
> > >> >We are retrieving all the 10000 rows in one call.
> > >> >
> > >> >Ans3:
> > >> >Disk detai:
> > >> >Model Number:       ST2000DM001-1CH164
> > >> >Serial Number:      Z1E276YF
> > >> >
> > >> >Please suggest some more optimization
> > >> >
> > >> >Thanks,
> > >> >Ankit Jain
> > >> >
> > >> >On Mon, Apr 15, 2013 at 5:11 PM, 谢良 <[EMAIL PROTECTED]> wrote:
> > >> >
> > >> >> First, it's probably helpless to set block size to 4KB, please
> > >> >> refer to the beginning of HFile.java:
> > >> >>
> > >> >>  Smaller blocks are good
> > >> >>  * for random access, but require more memory to hold the block
> > >> >>index, and  may
> > >> >>  * be slower to create (because we must flush the compressor
+
Jean-Marc Spaggiari 2013-04-16, 11:01
+
Michel Segel 2013-04-17, 12:33
+
Håvard Wahl Kongsgård 2013-04-14, 22:19
+
Mohammad Tariq 2013-04-14, 22:39
+
Ted Yu 2013-07-08, 12:49