Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Sporadic memstore slowness for Read Heavy workloads


Copy link to this message
-
Re: Sporadic memstore slowness for Read Heavy workloads
I see you figured it out. I should read all email before I sent my last reply.

________________________________
 From: Varun Sharma <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]>
Sent: Tuesday, January 28, 2014 9:43 AM
Subject: Re: Sporadic memstore slowness for Read Heavy workloads
 
Ohk I think I understand this better now. So the order will actually be, something like this, at step #3

(ROW, <DELETE>, T=2)
(ROW, COL1, T=3)
(ROW, COL1, T=1)  - filtered

(ROW, COL2, T=3)
(ROW, COL2, T=1)  - filtered
(ROW, COL3, T=3)
(ROW, COL3, T=1)  - filtered

The ScanDeleteTracker class would simply filter out columns which have a timestamp < 2.

Varun

On Tue, Jan 28, 2014 at 9:04 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:

Lexicographically, (ROW, COL2, T=3) should come after (ROW, COL1, T=1) because COL2 > COL1 lexicographically. However in the above example, it comes before the delete marker and hence before (ROW, COL1, T=1) which is wrong, no ?
>
>
>
>On Tue, Jan 28, 2014 at 9:01 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
>bq. Now, clearly there will be columns above the delete marker which are
>>
>>smaller than the ones below it.
>>
>>This is where closer look is needed. Part of the confusion arises from
>>usage of > and < in your example.
>>(ROW, COL2, T=3) would sort before (ROW, COL1, T=1).
>>
>>Here, in terms of sort order, 'above' means before. 'below it' would mean
>>after. So 'smaller' would mean before.
>>
>>Cheers
>>
>>
>>
>>On Tue, Jan 28, 2014 at 8:47 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:
>>
>>> Hi Ted,
>>>
>>> Not satisfied with your answer, the document you sent does not talk about
>>> Delete ColumnFamily marker sort order. For the delete family marker to
>>> work, it has to mask *all* columns of a family. Hence it has to be above
>>> all the older columns. All the new columns must come above this column
>>> family delete marker. Now, clearly there will be columns above the delete
>>> marker which are smaller than the ones below it.
>>>
>>> The document talks nothing about delete marker order, could you answer the
>>> question by looking through the example ?
>>>
>>> Varun
>>>
>>>
>>> On Tue, Jan 28, 2014 at 5:09 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>>>
>>> > Varun:
>>> > Take a look at http://hbase.apache.org/book.html#dm.sort
>>> >
>>> > There's no contradiction.
>>> >
>>> > Cheers
>>> >
>>> > On Jan 27, 2014, at 11:40 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:
>>> >
>>> > > Actually, I now have another question because of the way our work load
>>> is
>>> > > structured. We use a wide schema and each time we write, we delete the
>>> > > entire row and write a fresh set of columns - we want to make sure no
>>> old
>>> > > columns survive. So, I just want to see if my picture of the memstore
>>> at
>>> > > this point is correct or not. My understanding is that Memstore is
>>> > > basically a skip list of keyvalues and compares the values using
>>> KeyValue
>>> > > comparator
>>> > >
>>> > > 1) *T=1, *We write 3 columns for "ROW". So memstore has:
>>> > >
>>> > > (ROW, COL1, T=1)
>>> > > (ROW, COL2, T=1)
>>> > > (ROW, COL3, T=1)
>>> > >
>>> > > 2) *T=2*, Now we write a delete marker for the entire ROW at T=2. So
>>> > > memstore has - my understanding is that we do not delete in the
>>> memstore
>>> > > but only add markers:
>>> > >
>>> > > (ROW, <DELETE>, T=2)
>>> > > (ROW, COL1, T=1)
>>> > > (ROW, COL2, T=1)
>>> > > (ROW, COL3, T=1)
>>> > >
>>> > > 3) Now we write our new fresh row for *T=3* - it should get inserted
>>> > above
>>> > > the delete.
>>> > >
>>> > > (ROW, COL1, T=3)
>>> > > (ROW, COL2, T=3)
>>> > > (ROW, COL3, T=3)
>>> > > (ROW, <DELETE>, T=2)
>>> > > (ROW, COL1, T=1)
>>> > > (ROW, COL2, T=1)
>>> > > (ROW, COL3, T=1)
>>> > >
>>> > > This is the ideal scenario for the data to be correctly reflected.
>>> > >
>>> > > (ROW, COL2, T=3) *>* (ROW, <DELETE>, T=2) *> *(ROW, COL1, T=1) and
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB