Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> How to understand the TS of each data version?


Copy link to this message
-
Re: How to understand the TS of each data version?
Hi, Ted

Thanks for your response. This is also the way I use to avoid the problem.

regards!

Yong
On Sat, Sep 28, 2013 at 4:31 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Can you make NetworkSpeed as column family ?
>
> This way you can treat individual suppliers as columns within the column
> family.
> So for "user Tom has a new supplier d instead of supplier c and its speed
> is 15K":
>
> rk       NetworkSpeed
>           c            d
> Tom   {10K:1}
> Tom                 {15K:2}
>
> In the example above, the numbers after colon are TS. If the speed is
> unknown, you can store a special marker in the Cell.
> I used two rows, but as you said, the two Cells can be written using one
> RPC call.
>
> This way, NetworkSupplier column is not needed.
>
> Cheers
>
>
> On Fri, Sep 27, 2013 at 3:04 PM, yonghu <[EMAIL PROTECTED]> wrote:
>
> > To Ted,
> >
> > --"Can you tell me why readings corresponding to different timestamps
> would
> > appear in the same row ?"
> >
> > Is that mean the data versions which belong to the same row should at
> least
> > have the same timestamps?
> >
> > For adding a row into HBase, I can use single Put instance, for example,
> > Put put = new Put("tom") and put.addColumn("Network:Supplier","c" ),
> > put.addColmn("Network:Supplier","d"). And hence the data versions will
> have
> > the same TS.
> >
> > However, I can also use multiple Put instances, each Put instance for
> > single data version. For example, Put put1 = new Put1("tom"),
> > put1.addaddColumn("Network:Supplier","c" ). Put put2 = new Put2("tom"),
> > put2.addaddColumn("Network:Supplier","d" ). In this situation, each data
> > version which belongs to the same row will have different TSs even if
> > logically they should have the same TSs. This situation can happen when I
> > first know the name of network supplier and later get the speed of
> > supplier.
> >
> > To lars,
> >
> > --"You have a single row with two columns?"
> >
> > This is just an example for discussion. I had a heavy discussion with the
> > other person about how to understand the right data representation and
> the
> > semantics of TS in HBase. Your explanation is one possible scenario which
> > means "user Tom has a new supplier d instead of supplier c and its speed
> is
> > 15K".
> > However, it is possible that "user Tom has both suppliers c and d and 15K
> > may belong to supplier c, as the speed of supplier d is not tested yet."
> > The second understanding is very tricky and if it happened, we need to
> > redesign the schema of database.
> >
> > So, I wonder
> > 1. If there are any predefined semantics of TS in HBase or the semantics
> of
> > TS is application-specific?
> > 2. Can anyone give any rules of how to assign TS for data versions which
> > belong to the same row?
> >
> > regards!
> >
> > Yong
> >
> >
> >
> >
> >
> > On Fri, Sep 27, 2013 at 7:02 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> >
> > > Not sure I follow.
> > > You have a single row with two columns?
> > > In your scenario you'd see that supplier c has 15k iff you query the
> > > latest data, which seems to be what you want.
> > > Note that you could also query as of TS 4 (c:20k), TS3 (d:20k), TS2
> > (d:10k)
> > >
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ________________________________
> > >  From: yonghu <[EMAIL PROTECTED]>
> > > To: [EMAIL PROTECTED]
> > > Sent: Friday, September 27, 2013 7:24 AM
> > > Subject: How to understand the TS of each data version?
> > >
> > >
> > > Hello,
> > >
> > > In my understanding, the timestamp of each data version is generated by
> > Put
> > > command. The value of TS is either indicated by user or assigned by
> HBase
> > > itself. If the TS is generated by HBase, it only records when (the time
> > > point) that data version is generated (Have no meaning to the
> > application).
> > > However, if TS is indicated by user, it may have a specific meaning to
> > > applications. The reason why I want to ask this question is: How can I
> > > correctly understand the meaning of following data? Suppose I have a
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB