Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> How to understand the TS of each data version?


Copy link to this message
-
Re: How to understand the TS of each data version?
Hi, Ted

Thanks for your response. This is also the way I use to avoid the problem.

regards!

Yong
On Sat, Sep 28, 2013 at 4:31 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Can you make NetworkSpeed as column family ?
>
> This way you can treat individual suppliers as columns within the column
> family.
> So for "user Tom has a new supplier d instead of supplier c and its speed
> is 15K":
>
> rk       NetworkSpeed
>           c            d
> Tom   {10K:1}
> Tom                 {15K:2}
>
> In the example above, the numbers after colon are TS. If the speed is
> unknown, you can store a special marker in the Cell.
> I used two rows, but as you said, the two Cells can be written using one
> RPC call.
>
> This way, NetworkSupplier column is not needed.
>
> Cheers
>
>
> On Fri, Sep 27, 2013 at 3:04 PM, yonghu <[EMAIL PROTECTED]> wrote:
>
> > To Ted,
> >
> > --"Can you tell me why readings corresponding to different timestamps
> would
> > appear in the same row ?"
> >
> > Is that mean the data versions which belong to the same row should at
> least
> > have the same timestamps?
> >
> > For adding a row into HBase, I can use single Put instance, for example,
> > Put put = new Put("tom") and put.addColumn("Network:Supplier","c" ),
> > put.addColmn("Network:Supplier","d"). And hence the data versions will
> have
> > the same TS.
> >
> > However, I can also use multiple Put instances, each Put instance for
> > single data version. For example, Put put1 = new Put1("tom"),
> > put1.addaddColumn("Network:Supplier","c" ). Put put2 = new Put2("tom"),
> > put2.addaddColumn("Network:Supplier","d" ). In this situation, each data
> > version which belongs to the same row will have different TSs even if
> > logically they should have the same TSs. This situation can happen when I
> > first know the name of network supplier and later get the speed of
> > supplier.
> >
> > To lars,
> >
> > --"You have a single row with two columns?"
> >
> > This is just an example for discussion. I had a heavy discussion with the
> > other person about how to understand the right data representation and
> the
> > semantics of TS in HBase. Your explanation is one possible scenario which
> > means "user Tom has a new supplier d instead of supplier c and its speed
> is
> > 15K".
> > However, it is possible that "user Tom has both suppliers c and d and 15K
> > may belong to supplier c, as the speed of supplier d is not tested yet."
> > The second understanding is very tricky and if it happened, we need to
> > redesign the schema of database.
> >
> > So, I wonder
> > 1. If there are any predefined semantics of TS in HBase or the semantics
> of
> > TS is application-specific?
> > 2. Can anyone give any rules of how to assign TS for data versions which
> > belong to the same row?
> >
> > regards!
> >
> > Yong
> >
> >
> >
> >
> >
> > On Fri, Sep 27, 2013 at 7:02 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> >
> > > Not sure I follow.
> > > You have a single row with two columns?
> > > In your scenario you'd see that supplier c has 15k iff you query the
> > > latest data, which seems to be what you want.
> > > Note that you could also query as of TS 4 (c:20k), TS3 (d:20k), TS2
> > (d:10k)
> > >
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ________________________________
> > >  From: yonghu <[EMAIL PROTECTED]>
> > > To: [EMAIL PROTECTED]
> > > Sent: Friday, September 27, 2013 7:24 AM
> > > Subject: How to understand the TS of each data version?
> > >
> > >
> > > Hello,
> > >
> > > In my understanding, the timestamp of each data version is generated by
> > Put
> > > command. The value of TS is either indicated by user or assigned by
> HBase
> > > itself. If the TS is generated by HBase, it only records when (the time
> > > point) that data version is generated (Have no meaning to the
> > application).
> > > However, if TS is indicated by user, it may have a specific meaning to
> > > applications. The reason why I want to ask this question is: How can I
> > > correctly understand the meaning of following data? Suppose I have a