Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Re: Get on a row with multiple columns


+
Varun Sharma 2013-02-09, 05:22
+
lars hofhansl 2013-02-09, 05:34
+
Varun Sharma 2013-02-09, 05:44
Copy link to this message
-
Re: Get on a row with multiple columns
Ted Yu 2013-02-09, 05:55
For the 300 column deletes, can you show us how the Delete(s) are
constructed ?

Do you use this method ?

  public Delete deleteColumns(byte [] family, byte [] qualifier) {
Thanks

On Fri, Feb 8, 2013 at 9:44 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> So a Get call with multiple columns on a single row should be much faster
> than independent Get(s) on each of those columns for that row. I am
> basically seeing severely poor performance (~ 15 seconds) for certain
> deleteColumn() calls and I am seeing that there is a
> prepareDeleteTimestamps() function in HRegion.java which first tries to
> locate the column by doing individual gets on each column you want to
> delete (I am doing 300 column deletes). Now, I think this should ideall by
> 1 get call with the batch of 300 columns so that one scan can retrieve the
> columns and the columns that are found, are indeed deleted.
>
> Before I try this fix, I wanted to get an opinion if it will make a
> difference to batch the get() and it seems from your answer, it should.
>
> On Fri, Feb 8, 2013 at 9:34 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > Everything is stored as a KeyValue in HBase.
> > The Key part of a KeyValue contains the row key, column family, column
> > name, and timestamp in that order.
> > Each column family has it's own store and store files.
> >
> > So in a nutshell a get is executed by starting a scan at the row key
> > (which is a prefix of the key) in each store (CF) and then scanning
> forward
> > in each store until the next row key is reached. (in reality it is a bit
> > more complicated due to multiple versions, skipping columns, etc)
> >
> >
> > -- Lars
> > ________________________________
> > From: Varun Sharma <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]
> > Sent: Friday, February 8, 2013 9:22 PM
> > Subject: Re: Get on a row with multiple columns
> >
> > Sorry, I was a little unclear with my question.
> >
> > Lets say you have
> >
> > Get get = new Get(row)
> > get.addColumn("1");
> > get.addColumn("2");
> > .
> > .
> > .
> >
> > When internally hbase executes the batch get, it will seek to column "1",
> > now since data is lexicographically sorted, it does not need to seek from
> > the beginning to get to "2", it can continue seeking, henceforth since
> > column "2" will always be after column "1". I want to know whether this
> is
> > how a multicolumn get on a row works or not.
> >
> > Thanks
> > Varun
> >
> > On Fri, Feb 8, 2013 at 9:08 PM, Marcos Ortiz <[EMAIL PROTECTED]> wrote:
> >
> > > Like Ishan said, a get give an instance of the Result class.
> > > All utility methods that you can use are:
> > >  byte[] getValue(byte[] family, byte[] qualifier)
> > >  byte[] value()
> > >  byte[] getRow()
> > >  int size()
> > >  boolean isEmpty()
> > >  KeyValue[] raw() # Like Ishan said, all data here is sorted
> > >  List<KeyValue> list()
> > >
> > >
> > >
> > >
> > > On 02/08/2013 11:29 PM, Ishan Chhabra wrote:
> > >
> > >> Based on what I read in Lars' book, a get will return a result a
> Result,
> > >> which is internally a KeyValue[]. This KeyValue[] is sorted by the key
> > and
> > >> you access this array using raw or list methods on the Result object.
> > >>
> > >>
> > >> On Fri, Feb 8, 2013 at 5:40 PM, Varun Sharma <[EMAIL PROTECTED]>
> > wrote:
> > >>
> > >>  +user
> > >>>
> > >>> On Fri, Feb 8, 2013 at 5:38 PM, Varun Sharma <[EMAIL PROTECTED]>
> > >>> wrote:
> > >>>
> > >>>  Hi,
> > >>>>
> > >>>> When I do a Get on a row with multiple column qualifiers. Do we sort
> > the
> > >>>> column qualifers and make use of the sorted order when we get the
> > >>>>
> > >>> results ?
> > >>>
> > >>>> Thanks
> > >>>> Varun
> > >>>>
> > >>>>
> > >>
> > >>
> > > --
> > > Marcos Ortiz Valmaseda,
> > > Product Manager && Data Scientist at UCI
> > > Blog: http://marcosluis2186.**posterous.com<
> > http://marcosluis2186.posterous.com>
> > > Twitter: @marcosluis2186 <http://twitter.com/**marcosluis2186<
> > http://twitter.com/marcosluis2186>
> > > >
+
Varun Sharma 2013-02-09, 06:05
+
lars hofhansl 2013-02-09, 06:33
+
Varun Sharma 2013-02-09, 06:45
+
Varun Sharma 2013-02-09, 06:57
+
lars hofhansl 2013-02-09, 07:31
+
lars hofhansl 2013-02-09, 07:41
+
lars hofhansl 2013-02-09, 07:57
+
Varun Sharma 2013-02-09, 08:05
+
Varun Sharma 2013-02-09, 08:11
+
lars hofhansl 2013-02-09, 08:17
+
Varun Sharma 2013-02-09, 08:29
+
Jean-Marc Spaggiari 2013-02-09, 13:02
+
lars hofhansl 2013-02-09, 16:46
+
Varun Sharma 2013-02-10, 22:35
+
Anoop Sam John 2013-02-11, 12:50
+
Varun Sharma 2013-02-11, 15:36
+
Varun Sharma 2013-02-11, 16:44
+
Varun Sharma 2013-02-11, 16:44
+
Ted Yu 2013-02-09, 06:09
+
Varun Sharma 2013-02-09, 06:16
+
Ted 2013-02-09, 06:29
+
lars hofhansl 2013-02-09, 06:34
+
Mrudula Madiraju 2013-08-14, 03:52