Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Sporadic memstore slowness for Read Heavy workloads


Copy link to this message
-
Re: Sporadic memstore slowness for Read Heavy workloads
Its ScanQueryMatcher-ScanDeleteTracker responsibility to process  deletes
during scanning.
On Tue, Jan 28, 2014 at 9:43 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> Ohk I think I understand this better now. So the order will actually be,
> something like this, at step #3
>
> (ROW, <DELETE>, T=2)
> (ROW, COL1, T=3)
> (ROW, COL1, T=1)  - filtered
> (ROW, COL2, T=3)
> (ROW, COL2, T=1)  - filtered
> (ROW, COL3, T=3)
> (ROW, COL3, T=1)  - filtered
>
> The ScanDeleteTracker class would simply filter out columns which have a
> timestamp < 2.
>
> Varun
>
>
> On Tue, Jan 28, 2014 at 9:04 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:
>
> > Lexicographically, (ROW, COL2, T=3) should come after (ROW, COL1, T=1)
> > because COL2 > COL1 lexicographically. However in the above example, it
> > comes before the delete marker and hence before (ROW, COL1, T=1) which is
> > wrong, no ?
> >
> >
> > On Tue, Jan 28, 2014 at 9:01 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> >> bq. Now, clearly there will be columns above the delete marker which are
> >> smaller than the ones below it.
> >>
> >> This is where closer look is needed. Part of the confusion arises from
> >> usage of > and < in your example.
> >> (ROW, COL2, T=3) would sort before (ROW, COL1, T=1).
> >>
> >> Here, in terms of sort order, 'above' means before. 'below it' would
> mean
> >> after. So 'smaller' would mean before.
> >>
> >> Cheers
> >>
> >>
> >> On Tue, Jan 28, 2014 at 8:47 AM, Varun Sharma <[EMAIL PROTECTED]>
> >> wrote:
> >>
> >> > Hi Ted,
> >> >
> >> > Not satisfied with your answer, the document you sent does not talk
> >> about
> >> > Delete ColumnFamily marker sort order. For the delete family marker to
> >> > work, it has to mask *all* columns of a family. Hence it has to be
> above
> >> > all the older columns. All the new columns must come above this column
> >> > family delete marker. Now, clearly there will be columns above the
> >> delete
> >> > marker which are smaller than the ones below it.
> >> >
> >> > The document talks nothing about delete marker order, could you answer
> >> the
> >> > question by looking through the example ?
> >> >
> >> > Varun
> >> >
> >> >
> >> > On Tue, Jan 28, 2014 at 5:09 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >> >
> >> > > Varun:
> >> > > Take a look at http://hbase.apache.org/book.html#dm.sort
> >> > >
> >> > > There's no contradiction.
> >> > >
> >> > > Cheers
> >> > >
> >> > > On Jan 27, 2014, at 11:40 PM, Varun Sharma <[EMAIL PROTECTED]>
> >> wrote:
> >> > >
> >> > > > Actually, I now have another question because of the way our work
> >> load
> >> > is
> >> > > > structured. We use a wide schema and each time we write, we delete
> >> the
> >> > > > entire row and write a fresh set of columns - we want to make sure
> >> no
> >> > old
> >> > > > columns survive. So, I just want to see if my picture of the
> >> memstore
> >> > at
> >> > > > this point is correct or not. My understanding is that Memstore is
> >> > > > basically a skip list of keyvalues and compares the values using
> >> > KeyValue
> >> > > > comparator
> >> > > >
> >> > > > 1) *T=1, *We write 3 columns for "ROW". So memstore has:
> >> > > >
> >> > > > (ROW, COL1, T=1)
> >> > > > (ROW, COL2, T=1)
> >> > > > (ROW, COL3, T=1)
> >> > > >
> >> > > > 2) *T=2*, Now we write a delete marker for the entire ROW at T=2.
> So
> >> > > > memstore has - my understanding is that we do not delete in the
> >> > memstore
> >> > > > but only add markers:
> >> > > >
> >> > > > (ROW, <DELETE>, T=2)
> >> > > > (ROW, COL1, T=1)
> >> > > > (ROW, COL2, T=1)
> >> > > > (ROW, COL3, T=1)
> >> > > >
> >> > > > 3) Now we write our new fresh row for *T=3* - it should get
> inserted
> >> > > above
> >> > > > the delete.
> >> > > >
> >> > > > (ROW, COL1, T=3)
> >> > > > (ROW, COL2, T=3)
> >> > > > (ROW, COL3, T=3)
> >> > > > (ROW, <DELETE>, T=2)
> >> > > > (ROW, COL1, T=1)
> >> > > > (ROW, COL2, T=1)
> >> > > > (ROW, COL3, T=1)
> >> > > >
> >> > > > This is the ideal scenario for the data to be correctly reflected.

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB