Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Re: Get on a row with multiple columns


Copy link to this message
-
Re: Get on a row with multiple columns
So a Get call with multiple columns on a single row should be much faster
than independent Get(s) on each of those columns for that row. I am
basically seeing severely poor performance (~ 15 seconds) for certain
deleteColumn() calls and I am seeing that there is a
prepareDeleteTimestamps() function in HRegion.java which first tries to
locate the column by doing individual gets on each column you want to
delete (I am doing 300 column deletes). Now, I think this should ideall by
1 get call with the batch of 300 columns so that one scan can retrieve the
columns and the columns that are found, are indeed deleted.

Before I try this fix, I wanted to get an opinion if it will make a
difference to batch the get() and it seems from your answer, it should.

On Fri, Feb 8, 2013 at 9:34 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Everything is stored as a KeyValue in HBase.
> The Key part of a KeyValue contains the row key, column family, column
> name, and timestamp in that order.
> Each column family has it's own store and store files.
>
> So in a nutshell a get is executed by starting a scan at the row key
> (which is a prefix of the key) in each store (CF) and then scanning forward
> in each store until the next row key is reached. (in reality it is a bit
> more complicated due to multiple versions, skipping columns, etc)
>
>
> -- Lars
> ________________________________
> From: Varun Sharma <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Friday, February 8, 2013 9:22 PM
> Subject: Re: Get on a row with multiple columns
>
> Sorry, I was a little unclear with my question.
>
> Lets say you have
>
> Get get = new Get(row)
> get.addColumn("1");
> get.addColumn("2");
> .
> .
> .
>
> When internally hbase executes the batch get, it will seek to column "1",
> now since data is lexicographically sorted, it does not need to seek from
> the beginning to get to "2", it can continue seeking, henceforth since
> column "2" will always be after column "1". I want to know whether this is
> how a multicolumn get on a row works or not.
>
> Thanks
> Varun
>
> On Fri, Feb 8, 2013 at 9:08 PM, Marcos Ortiz <[EMAIL PROTECTED]> wrote:
>
> > Like Ishan said, a get give an instance of the Result class.
> > All utility methods that you can use are:
> >  byte[] getValue(byte[] family, byte[] qualifier)
> >  byte[] value()
> >  byte[] getRow()
> >  int size()
> >  boolean isEmpty()
> >  KeyValue[] raw() # Like Ishan said, all data here is sorted
> >  List<KeyValue> list()
> >
> >
> >
> >
> > On 02/08/2013 11:29 PM, Ishan Chhabra wrote:
> >
> >> Based on what I read in Lars' book, a get will return a result a Result,
> >> which is internally a KeyValue[]. This KeyValue[] is sorted by the key
> and
> >> you access this array using raw or list methods on the Result object.
> >>
> >>
> >> On Fri, Feb 8, 2013 at 5:40 PM, Varun Sharma <[EMAIL PROTECTED]>
> wrote:
> >>
> >>  +user
> >>>
> >>> On Fri, Feb 8, 2013 at 5:38 PM, Varun Sharma <[EMAIL PROTECTED]>
> >>> wrote:
> >>>
> >>>  Hi,
> >>>>
> >>>> When I do a Get on a row with multiple column qualifiers. Do we sort
> the
> >>>> column qualifers and make use of the sorted order when we get the
> >>>>
> >>> results ?
> >>>
> >>>> Thanks
> >>>> Varun
> >>>>
> >>>>
> >>
> >>
> > --
> > Marcos Ortiz Valmaseda,
> > Product Manager && Data Scientist at UCI
> > Blog: http://marcosluis2186.**posterous.com<
> http://marcosluis2186.posterous.com>
> > Twitter: @marcosluis2186 <http://twitter.com/**marcosluis2186<
> http://twitter.com/marcosluis2186>
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB