Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Re: Get on a row with multiple columns


+
Varun Sharma 2013-02-09, 05:22
+
lars hofhansl 2013-02-09, 05:34
+
Varun Sharma 2013-02-09, 05:44
+
Ted Yu 2013-02-09, 05:55
+
Varun Sharma 2013-02-09, 06:05
+
lars hofhansl 2013-02-09, 06:33
+
Varun Sharma 2013-02-09, 06:45
+
Varun Sharma 2013-02-09, 06:57
+
lars hofhansl 2013-02-09, 07:31
+
lars hofhansl 2013-02-09, 07:41
+
lars hofhansl 2013-02-09, 07:57
+
Varun Sharma 2013-02-09, 08:05
+
Varun Sharma 2013-02-09, 08:11
+
lars hofhansl 2013-02-09, 08:17
+
Varun Sharma 2013-02-09, 08:29
+
Jean-Marc Spaggiari 2013-02-09, 13:02
+
lars hofhansl 2013-02-09, 16:46
+
Varun Sharma 2013-02-10, 22:35
+
Anoop Sam John 2013-02-11, 12:50
+
Varun Sharma 2013-02-11, 15:36
+
Varun Sharma 2013-02-11, 16:44
+
Varun Sharma 2013-02-11, 16:44
Copy link to this message
-
Re: Get on a row with multiple columns
Which HBase version are you using ?

Is there a way to place 10 delete markers from application side instead of
300 ?

Thanks

On Fri, Feb 8, 2013 at 10:05 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> We are given a set of 300 columns to delete. I tested two cases:
>
> 1) deleteColumns() - with the 's'
>
> This function simply adds delete markers for 300 columns, in our case,
> typically only a fraction of these columns are actually present - 10. After
> starting to use deleteColumns, we starting seeing a drop in cluster wide
> random read performance - 90th percentile latency worsened, so did 99th
> probably because of having to traverse delete markers. I attribute this to
> profusion of delete markers in the cluster. Major compactions slowed down
> by almost 50 percent probably because of having to clean out significantly
> more delete markers.
>
> 2) deleteColumn()
>
> Ended up with untolerable 15 second calls, which clogged all the handlers.
> Making the cluster pretty much unresponsive.
>
> On Fri, Feb 8, 2013 at 9:55 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > For the 300 column deletes, can you show us how the Delete(s) are
> > constructed ?
> >
> > Do you use this method ?
> >
> >   public Delete deleteColumns(byte [] family, byte [] qualifier) {
> > Thanks
> >
> > On Fri, Feb 8, 2013 at 9:44 PM, Varun Sharma <[EMAIL PROTECTED]>
> wrote:
> >
> > > So a Get call with multiple columns on a single row should be much
> faster
> > > than independent Get(s) on each of those columns for that row. I am
> > > basically seeing severely poor performance (~ 15 seconds) for certain
> > > deleteColumn() calls and I am seeing that there is a
> > > prepareDeleteTimestamps() function in HRegion.java which first tries to
> > > locate the column by doing individual gets on each column you want to
> > > delete (I am doing 300 column deletes). Now, I think this should ideall
> > by
> > > 1 get call with the batch of 300 columns so that one scan can retrieve
> > the
> > > columns and the columns that are found, are indeed deleted.
> > >
> > > Before I try this fix, I wanted to get an opinion if it will make a
> > > difference to batch the get() and it seems from your answer, it should.
> > >
> > > On Fri, Feb 8, 2013 at 9:34 PM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
> > >
> > > > Everything is stored as a KeyValue in HBase.
> > > > The Key part of a KeyValue contains the row key, column family,
> column
> > > > name, and timestamp in that order.
> > > > Each column family has it's own store and store files.
> > > >
> > > > So in a nutshell a get is executed by starting a scan at the row key
> > > > (which is a prefix of the key) in each store (CF) and then scanning
> > > forward
> > > > in each store until the next row key is reached. (in reality it is a
> > bit
> > > > more complicated due to multiple versions, skipping columns, etc)
> > > >
> > > >
> > > > -- Lars
> > > > ________________________________
> > > > From: Varun Sharma <[EMAIL PROTECTED]>
> > > > To: [EMAIL PROTECTED]
> > > > Sent: Friday, February 8, 2013 9:22 PM
> > > > Subject: Re: Get on a row with multiple columns
> > > >
> > > > Sorry, I was a little unclear with my question.
> > > >
> > > > Lets say you have
> > > >
> > > > Get get = new Get(row)
> > > > get.addColumn("1");
> > > > get.addColumn("2");
> > > > .
> > > > .
> > > > .
> > > >
> > > > When internally hbase executes the batch get, it will seek to column
> > "1",
> > > > now since data is lexicographically sorted, it does not need to seek
> > from
> > > > the beginning to get to "2", it can continue seeking, henceforth
> since
> > > > column "2" will always be after column "1". I want to know whether
> this
> > > is
> > > > how a multicolumn get on a row works or not.
> > > >
> > > > Thanks
> > > > Varun
> > > >
> > > > On Fri, Feb 8, 2013 at 9:08 PM, Marcos Ortiz <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > Like Ishan said, a get give an instance of the Result class.
> > > > > All utility methods that you can use are:
+
Varun Sharma 2013-02-09, 06:16
+
Ted 2013-02-09, 06:29
+
lars hofhansl 2013-02-09, 06:34
+
Mrudula Madiraju 2013-08-14, 03:52