Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Sporadic memstore slowness for Read Heavy workloads


Copy link to this message
-
Re: Sporadic memstore slowness for Read Heavy workloads
Vladimir Rodionov 2014-01-28, 17:49
Its ScanQueryMatcher-ScanDeleteTracker responsibility to process  deletes
during scanning.
On Tue, Jan 28, 2014 at 9:43 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> Ohk I think I understand this better now. So the order will actually be,
> something like this, at step #3
>
> (ROW, <DELETE>, T=2)
> (ROW, COL1, T=3)
> (ROW, COL1, T=1)  - filtered
> (ROW, COL2, T=3)
> (ROW, COL2, T=1)  - filtered
> (ROW, COL3, T=3)
> (ROW, COL3, T=1)  - filtered
>
> The ScanDeleteTracker class would simply filter out columns which have a
> timestamp < 2.
>
> Varun
>
>
> On Tue, Jan 28, 2014 at 9:04 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:
>
> > Lexicographically, (ROW, COL2, T=3) should come after (ROW, COL1, T=1)
> > because COL2 > COL1 lexicographically. However in the above example, it
> > comes before the delete marker and hence before (ROW, COL1, T=1) which is
> > wrong, no ?
> >
> >
> > On Tue, Jan 28, 2014 at 9:01 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> >> bq. Now, clearly there will be columns above the delete marker which are
> >> smaller than the ones below it.
> >>
> >> This is where closer look is needed. Part of the confusion arises from
> >> usage of > and < in your example.
> >> (ROW, COL2, T=3) would sort before (ROW, COL1, T=1).
> >>
> >> Here, in terms of sort order, 'above' means before. 'below it' would
> mean
> >> after. So 'smaller' would mean before.
> >>
> >> Cheers
> >>
> >>
> >> On Tue, Jan 28, 2014 at 8:47 AM, Varun Sharma <[EMAIL PROTECTED]>
> >> wrote:
> >>
> >> > Hi Ted,
> >> >
> >> > Not satisfied with your answer, the document you sent does not talk
> >> about
> >> > Delete ColumnFamily marker sort order. For the delete family marker to
> >> > work, it has to mask *all* columns of a family. Hence it has to be
> above
> >> > all the older columns. All the new columns must come above this column
> >> > family delete marker. Now, clearly there will be columns above the
> >> delete
> >> > marker which are smaller than the ones below it.
> >> >
> >> > The document talks nothing about delete marker order, could you answer
> >> the
> >> > question by looking through the example ?
> >> >
> >> > Varun
> >> >
> >> >
> >> > On Tue, Jan 28, 2014 at 5:09 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >> >
> >> > > Varun:
> >> > > Take a look at http://hbase.apache.org/book.html#dm.sort
> >> > >
> >> > > There's no contradiction.
> >> > >
> >> > > Cheers
> >> > >
> >> > > On Jan 27, 2014, at 11:40 PM, Varun Sharma <[EMAIL PROTECTED]>
> >> wrote:
> >> > >
> >> > > > Actually, I now have another question because of the way our work
> >> load
> >> > is
> >> > > > structured. We use a wide schema and each time we write, we delete
> >> the
> >> > > > entire row and write a fresh set of columns - we want to make sure
> >> no
> >> > old
> >> > > > columns survive. So, I just want to see if my picture of the
> >> memstore
> >> > at
> >> > > > this point is correct or not. My understanding is that Memstore is
> >> > > > basically a skip list of keyvalues and compares the values using
> >> > KeyValue
> >> > > > comparator
> >> > > >
> >> > > > 1) *T=1, *We write 3 columns for "ROW". So memstore has:
> >> > > >
> >> > > > (ROW, COL1, T=1)
> >> > > > (ROW, COL2, T=1)
> >> > > > (ROW, COL3, T=1)
> >> > > >
> >> > > > 2) *T=2*, Now we write a delete marker for the entire ROW at T=2.
> So
> >> > > > memstore has - my understanding is that we do not delete in the
> >> > memstore
> >> > > > but only add markers:
> >> > > >
> >> > > > (ROW, <DELETE>, T=2)
> >> > > > (ROW, COL1, T=1)
> >> > > > (ROW, COL2, T=1)
> >> > > > (ROW, COL3, T=1)
> >> > > >
> >> > > > 3) Now we write our new fresh row for *T=3* - it should get
> inserted
> >> > > above
> >> > > > the delete.
> >> > > >
> >> > > > (ROW, COL1, T=3)
> >> > > > (ROW, COL2, T=3)
> >> > > > (ROW, COL3, T=3)
> >> > > > (ROW, <DELETE>, T=2)
> >> > > > (ROW, COL1, T=1)
> >> > > > (ROW, COL2, T=1)
> >> > > > (ROW, COL3, T=1)
> >> > > >
> >> > > > This is the ideal scenario for the data to be correctly reflected.