Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> delete operation with timestamp


Copy link to this message
-
Re: delete operation with timestamp
Hi Lars,
>>You could look at the code :)
Did exactly that. Just wanted to be sure that I am not missing any insight.

>>Typically you won't add many columns with different time stamps as part
of the same put... You are right, though, it is not strictly needed.
Understood now.

Thanks for bearing with me Lars.

-Shrijeet
On Mon, Nov 28, 2011 at 8:16 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> You could look at the code :)
>
>
> The time stamps that count are the ones on the KeyValues maintained in the
> put's familyMap (the set of KVs mapped to CFs).
>
> In fact the put's TS is just a convenience used as default TS for the
> added KVs, it is not used at the server.
> Typically you won't add many columns with different time stamps as part of
> the same put... You are right, though, it is not strictly needed.
>
>
> ----- Original Message -----
> From: Shrijeet Paliwal <[EMAIL PROTECTED]>
> To: lars hofhansl <[EMAIL PROTECTED]>
> Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Sent: Monday, November 28, 2011 5:49 PM
> Subject: Re: delete operation with timestamp
>
> Lars,
> Thank you for writing. It does make sense.
>
> >>So if you trigger a Put operations from the client and you change (say) 3
> columns, the server will insert 3 KeyValues into the Memstore all of which
> carry
> >>the TS of the Put.
> What if I construct the Put object by calling three calls to 'add' with my
> own timestamp:
>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html#add(byte[]
> ,
> byte[], long, byte[])
> In such a case the the keyvalue list members will have different TS than
> the TS of the put. What will be the meaning of TS of Put on server side
> now?
>
> >>Having the TS per cell (or KeyValue) is necessary to enforce ACID
> guarantees, which state that what you retrieve with Get is a set of
> KeyValues such as this
> >>combination of versions of KeyValues for this row existed together at a
> point. (need to remember here that multiple Put operations could insert
> different columns for the same rowKey).
> Yes this totally makes sense. And my question is around this, what is the
> need to maintain TS at put at all. Even if client does not want to specify
> a timestamp , the burdon of including the latest timestamp can be passed to
> KeyValue object.
>
> -Shrijeet
>
> On Mon, Nov 28, 2011 at 5:33 PM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
>
> > Hi Shrijeet,
> >
> > you have to distinguish between the storage format and the client side
> > objects. KeyValue is an outlier (of sorts) as it is used on both server
> and
> > client).
> > Timestamps are per cell (KeyValue).
> >
> >
> > A Put object is something you create on the client to describe a put
> > operation to be performed at the server.
> > The server will take the information from the Put and write the necessary
> > KeyValues into the Memstore (which will eventually be flushed to disk).
> >
> > So if you trigger a Put operations from the client and you change (say) 3
> > columns, the server will insert 3 KeyValues into the Memstore all of
> which
> > carry
> > the TS of the Put.
> >
> > Having the TS per cell (or KeyValue) is necessary to enforce ACID
> > guarantees, which state that what you retrieve with Get is a set of
> > KeyValues such as this
> > combination of versions of KeyValues for this row existed together at a
> > point. (need to remember here that multiple Put operations could insert
> > different columns for the same rowKey).
> >
> >
> > Makes sense?
> >
> > -- Lars
> >
> >
> > ----- Original Message -----
> > From: Shrijeet Paliwal <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> > Cc:
> > Sent: Monday, November 28, 2011 4:31 PM
> > Subject: Re: delete operation with timestamp
> >
> > Slightly offtopic, sorry.
> >
> > While we have attention on timestamps may I ask why HBase maintains a
> > timestamp at row level (initialized with LATEST_TIMESTAMP)?
> > In other words timestamp has meaning in context of a cell and HBase