Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Re: Get on a row with multiple columns


+
Varun Sharma 2013-02-09, 05:22
+
lars hofhansl 2013-02-09, 05:34
+
Varun Sharma 2013-02-09, 05:44
+
Ted Yu 2013-02-09, 05:55
+
Varun Sharma 2013-02-09, 06:05
+
lars hofhansl 2013-02-09, 06:33
+
Varun Sharma 2013-02-09, 06:45
+
Varun Sharma 2013-02-09, 06:57
+
lars hofhansl 2013-02-09, 07:31
+
lars hofhansl 2013-02-09, 07:41
+
lars hofhansl 2013-02-09, 07:57
+
Varun Sharma 2013-02-09, 08:05
+
Varun Sharma 2013-02-09, 08:11
+
lars hofhansl 2013-02-09, 08:17
+
Varun Sharma 2013-02-09, 08:29
+
Jean-Marc Spaggiari 2013-02-09, 13:02
+
lars hofhansl 2013-02-09, 16:46
+
Varun Sharma 2013-02-10, 22:35
+
Anoop Sam John 2013-02-11, 12:50
+
Varun Sharma 2013-02-11, 15:36
+
Varun Sharma 2013-02-11, 16:44
+
Varun Sharma 2013-02-11, 16:44
+
Ted Yu 2013-02-09, 06:09
+
Varun Sharma 2013-02-09, 06:16
Copy link to this message
-
Re: Get on a row with multiple columns
How often do you need to perform such delete operation ?

Is there way to utilize ttl so that you can avoid deletions ?

Pardon me for not knowing your use case very well.

On Feb 8, 2013, at 10:16 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> Using hbase 0.94.3. Tried that too, ran into performance issues with having
> to retrieve the entire row first (this was getting slow when one particular
> row is hammered) since row can be big (few megs, some times 10s of megs)
> and then finding the columns and then doing a delete.
>
> To me, it looks like the current implementation of deleteColumn is
> suboptimal because of the 300 gets vs doing 1.
>
> Thanks
> Varun
>
> On Fri, Feb 8, 2013 at 10:09 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
>> Which HBase version are you using ?
>>
>> Is there a way to place 10 delete markers from application side instead of
>> 300 ?
>>
>> Thanks
>>
>> On Fri, Feb 8, 2013 at 10:05 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:
>>
>>> We are given a set of 300 columns to delete. I tested two cases:
>>>
>>> 1) deleteColumns() - with the 's'
>>>
>>> This function simply adds delete markers for 300 columns, in our case,
>>> typically only a fraction of these columns are actually present - 10.
>> After
>>> starting to use deleteColumns, we starting seeing a drop in cluster wide
>>> random read performance - 90th percentile latency worsened, so did 99th
>>> probably because of having to traverse delete markers. I attribute this
>> to
>>> profusion of delete markers in the cluster. Major compactions slowed down
>>> by almost 50 percent probably because of having to clean out
>> significantly
>>> more delete markers.
>>>
>>> 2) deleteColumn()
>>>
>>> Ended up with untolerable 15 second calls, which clogged all the
>> handlers.
>>> Making the cluster pretty much unresponsive.
>>>
>>> On Fri, Feb 8, 2013 at 9:55 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>>>
>>>> For the 300 column deletes, can you show us how the Delete(s) are
>>>> constructed ?
>>>>
>>>> Do you use this method ?
>>>>
>>>>  public Delete deleteColumns(byte [] family, byte [] qualifier) {
>>>> Thanks
>>>>
>>>> On Fri, Feb 8, 2013 at 9:44 PM, Varun Sharma <[EMAIL PROTECTED]>
>>> wrote:
>>>>
>>>>> So a Get call with multiple columns on a single row should be much
>>> faster
>>>>> than independent Get(s) on each of those columns for that row. I am
>>>>> basically seeing severely poor performance (~ 15 seconds) for certain
>>>>> deleteColumn() calls and I am seeing that there is a
>>>>> prepareDeleteTimestamps() function in HRegion.java which first tries
>> to
>>>>> locate the column by doing individual gets on each column you want to
>>>>> delete (I am doing 300 column deletes). Now, I think this should
>> ideall
>>>> by
>>>>> 1 get call with the batch of 300 columns so that one scan can
>> retrieve
>>>> the
>>>>> columns and the columns that are found, are indeed deleted.
>>>>>
>>>>> Before I try this fix, I wanted to get an opinion if it will make a
>>>>> difference to batch the get() and it seems from your answer, it
>> should.
>>>>>
>>>>> On Fri, Feb 8, 2013 at 9:34 PM, lars hofhansl <[EMAIL PROTECTED]>
>>> wrote:
>>>>>
>>>>>> Everything is stored as a KeyValue in HBase.
>>>>>> The Key part of a KeyValue contains the row key, column family,
>>> column
>>>>>> name, and timestamp in that order.
>>>>>> Each column family has it's own store and store files.
>>>>>>
>>>>>> So in a nutshell a get is executed by starting a scan at the row
>> key
>>>>>> (which is a prefix of the key) in each store (CF) and then scanning
>>>>> forward
>>>>>> in each store until the next row key is reached. (in reality it is
>> a
>>>> bit
>>>>>> more complicated due to multiple versions, skipping columns, etc)
>>>>>>
>>>>>>
>>>>>> -- Lars
>>>>>> ________________________________
>>>>>> From: Varun Sharma <[EMAIL PROTECTED]>
>>>>>> To: [EMAIL PROTECTED]
>>>>>> Sent: Friday, February 8, 2013 9:22 PM
>>>>>> Subject: Re: Get on a row with multiple columns
+
lars hofhansl 2013-02-09, 06:34
+
Mrudula Madiraju 2013-08-14, 03:52
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB