

Re: HBase Key Design : Doubt
No, you're right.
But if you just want to keep "500" as the value, you just have to set the number of version to 1 for your table... If you just want to keep 100, then you can insert with a revert timestamp, so the last cell inserted will be hidden by the previous one. JM 2012/10/11, Narayanan K <[EMAIL PROTECTED]>: > Hi, > > I have 2 column families A and B in table T1. > > put 'T1', 'R1', 'A:qualf1',100 > put 'T1', R1', 'B:qualf2', 200 > > As per my understanding the above is one row and one single version each > for the 2 column families. > > If I do a put 'T1', 'R1', 'A:qualf1', 500, then there is another version > for the rowkey pertaining to the combination {R1, A, qualf1} > > Please correct me if I am wrong. > > Regards, > Narayanan > > On Thu, Oct 11, 2012 at 1:02 AM, Doug Meil > <[EMAIL PROTECTED]>wrote: > >> >> Correct. >> >> If you do 2 Puts for row key ABCD on different days, the second Put >> logically replaces the first and the earlier Put becomes a previous >> version. Unless you specifically want older versions, you won't get them >> in either Gets or Scans. >> >> Definitely want to read thisŠ >> >> http://hbase.apache.org/book.html#datamodel >> >> See this for more information about they internal KeyValue structure. >> >> http://hbase.apache.org/book.html#regions.arch >> 9.7.5.4. KeyValue >> >> >> Older versions are kept around as long as the table descriptor says so >> (e.g., max versions). See the StoreFile and Compactions entries in the >> RefGuide for more information on the internals. >> >> >> >> >> On 10/10/12 3:24 PM, "Jerry Lam" <[EMAIL PROTECTED]> wrote: >> >> >correct me if I'm wrong. The version applies to the individual cell (ie. >> >row key, column family and column qualifier) not (row key, column >> > family). >> > >> > >> >On Wed, Oct 10, 2012 at 3:13 PM, Narayanan K <[EMAIL PROTECTED]> >> >wrote: >> > >> >> Hi all, >> >> >> >> I have a usecase wherein I need to find the unique of some things in >> >>HBase >> >> across dates. >> >> >> >> Say, on 1st Oct, ABCD appeared, hence I insert a row with rowkey : >> >> ABCD. >> >> On 2nd Oct, I get the same value ABCD and I don't want to >> >> redundantly >> >> store the row again with a new rowkey  ABCD for 2nd Oct >> >> i.e I will not want to have 20121001ABCD and 20121002ABCD as 2 >> >> rowkeys in the table. >> >> >> >> Eg: If I have 1st Oct , 2nd Oct as 2 column families and if number of >> >> versions are set to 1, only 1 row will be present in for both the >> >> dates >> >> having rowkey ABCD. >> >> Hence if I need to find unique number of times ABCD appeared during >> >>Oct >> >> 1 and Oct 2, I just need to take rowcount of the row ABCD by >> >>filtering >> >> over the 2 column families. >> >> Similarly, if we have 10 date column families, and I need to scan >> >> only >> >>for >> >> 2 dates, then it scans only those store files having the specified >> >>column >> >> families. This will make scanning faster. >> >> >> >> But here the design problem is that I cant add more column families to >> >>the >> >> table each day. >> >> >> >> I would need to store data every day and I read that HBase doesnt work >> >>well >> >> with more than 3 column families. >> >> >> >> The other option is to have one single column family and store dates >> >> as >> >> qualifiers : date:d1, date:d2.... But here if there are 30 date >> >>qualifiers >> >> under date column family, to scan a single date qualifier or may be >> >>range >> >> of 23 dates will have to scan through the entire data of all d1 to >> >> d30 >> >> qualifiers in the date column family which would be slower compared to >> >> having separate column families for the each date.. >> >> >> >> Please share your thoughts on this. Also any alternate design >> >>suggestions >> >> you might have. >> >> >> >> Regards, >> >> Narayanan >> >> >> >> >> > 

