Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Re: Get on a row with multiple columns


Copy link to this message
-
Re: Get on a row with multiple columns
So a Get call with multiple columns on a single row should be much faster
than independent Get(s) on each of those columns for that row. I am
basically seeing severely poor performance (~ 15 seconds) for certain
deleteColumn() calls and I am seeing that there is a
prepareDeleteTimestamps() function in HRegion.java which first tries to
locate the column by doing individual gets on each column you want to
delete (I am doing 300 column deletes). Now, I think this should ideall by
1 get call with the batch of 300 columns so that one scan can retrieve the
columns and the columns that are found, are indeed deleted.

Before I try this fix, I wanted to get an opinion if it will make a
difference to batch the get() and it seems from your answer, it should.

On Fri, Feb 8, 2013 at 9:34 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Everything is stored as a KeyValue in HBase.
> The Key part of a KeyValue contains the row key, column family, column
> name, and timestamp in that order.
> Each column family has it's own store and store files.
>
> So in a nutshell a get is executed by starting a scan at the row key
> (which is a prefix of the key) in each store (CF) and then scanning forward
> in each store until the next row key is reached. (in reality it is a bit
> more complicated due to multiple versions, skipping columns, etc)
>
>
> -- Lars
> ________________________________
> From: Varun Sharma <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Friday, February 8, 2013 9:22 PM
> Subject: Re: Get on a row with multiple columns
>
> Sorry, I was a little unclear with my question.
>
> Lets say you have
>
> Get get = new Get(row)
> get.addColumn("1");
> get.addColumn("2");
> .
> .
> .
>
> When internally hbase executes the batch get, it will seek to column "1",
> now since data is lexicographically sorted, it does not need to seek from
> the beginning to get to "2", it can continue seeking, henceforth since
> column "2" will always be after column "1". I want to know whether this is
> how a multicolumn get on a row works or not.
>
> Thanks
> Varun
>
> On Fri, Feb 8, 2013 at 9:08 PM, Marcos Ortiz <[EMAIL PROTECTED]> wrote:
>
> > Like Ishan said, a get give an instance of the Result class.
> > All utility methods that you can use are:
> >  byte[] getValue(byte[] family, byte[] qualifier)
> >  byte[] value()
> >  byte[] getRow()
> >  int size()
> >  boolean isEmpty()
> >  KeyValue[] raw() # Like Ishan said, all data here is sorted
> >  List<KeyValue> list()
> >
> >
> >
> >
> > On 02/08/2013 11:29 PM, Ishan Chhabra wrote:
> >
> >> Based on what I read in Lars' book, a get will return a result a Result,
> >> which is internally a KeyValue[]. This KeyValue[] is sorted by the key
> and
> >> you access this array using raw or list methods on the Result object.
> >>
> >>
> >> On Fri, Feb 8, 2013 at 5:40 PM, Varun Sharma <[EMAIL PROTECTED]>
> wrote:
> >>
> >>  +user
> >>>
> >>> On Fri, Feb 8, 2013 at 5:38 PM, Varun Sharma <[EMAIL PROTECTED]>
> >>> wrote:
> >>>
> >>>  Hi,
> >>>>
> >>>> When I do a Get on a row with multiple column qualifiers. Do we sort
> the
> >>>> column qualifers and make use of the sorted order when we get the
> >>>>
> >>> results ?
> >>>
> >>>> Thanks
> >>>> Varun
> >>>>
> >>>>
> >>
> >>
> > --
> > Marcos Ortiz Valmaseda,
> > Product Manager && Data Scientist at UCI
> > Blog: http://marcosluis2186.**posterous.com<
> http://marcosluis2186.posterous.com>
> > Twitter: @marcosluis2186 <http://twitter.com/**marcosluis2186<
> http://twitter.com/marcosluis2186>
> > >
> >
>