Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Disable timestamp in HBase Table a.k.a Disable Versioning in HBase Table


Copy link to this message
-
Re: Disable timestamp in HBase Table a.k.a Disable Versioning in HBase Table
@Anoop: We recently finished out first phase of POC. It went quite well.
Now, we are trying to see which all features we are going to use for final
implementation. We are still in research mode trying out different options.
We are also trying out LZO and Snappy compression algos. Yes, in my POC V1
also in my custom mapper for bulkloader i was passing same value of curtime
in millis for a single row. I can easily change the loader to take 0L as
timestamp for all data.

@Matt: We are using cloudera distribution at present. So, i will need to
ask cloudera folks about the hbase version used in cdh4(at present it's
0.92). I looked into hbase site and current stable version is 0.92. So, i
think it seems really tough that 0.96 will be a stable release in next 3-4
months. Anyways, any idea when HBase 0.96 is supposed to be released?stable?

> HBase-6093 seems to be very close to my suggestion. The only difference is
> that Matt mentioned in the description that it can only be used when all
> inserts are type=Put. Is aforementioned restriction due to HFileV2? I
think
> deleting an entire row wouldn't be a problem. right?

Any inputs on the above question?

On Tue, May 29, 2012 at 9:26 PM, Anoop Sam John <[EMAIL PROTECTED]> wrote:

> Hi Anil,
>         As HBASE-4676 is not available as of now, may be you can check
> other enoders, DiffKeyDeltaEncoder or FastDiffDeltaEncoder.
> Pls go through the javadoc of these and see what they do apart from
> compressing the timestamp parts. These do other nice stiff too which will
> make your data stored on disk to be smaller size.
>
> When HBASE-4676 comes you can try using that as it would be more close to
> your need I think.
>
> Also pls make sure to set timestamp as 0L in all your Puts. If you don't
> do that then HBase will set the curtime in millis as the timestamp for each
> Put.
>
> -Anoop-
> ________________________________________
> From: Matt Corgan [[EMAIL PROTECTED]]
> Sent: Wednesday, May 30, 2012 5:16 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Disable timestamp in HBase Table a.k.a Disable Versioning in
> HBase Table
>
> >
> > Is this feature going to be part of any future release of HBase?
>
> i couldn't get it finished in time for 0.94, but i think it's very likely
> to be in 0.96, possibly with a backport to .94.  Scan speed should improve
> if i have time to optimize the cell comparators and collators
>
>
> On Tue, May 29, 2012 at 4:29 PM, anil gupta <[EMAIL PROTECTED]> wrote:
>
> > Hi All,
> >
> > Sorry for late reply as i got stuck in other task at work on Friday and
> > skimming through the HBase-4676 took me a while.
> >
> > HBase-6093 seems to be very close to my suggestion. The only difference
> is
> > that Matt mentioned in the description that it can only be used when all
> > inserts are type=Put. Is aforementioned restriction due to HFileV2? I
> think
> > deleting an entire row wouldn't be a problem. right? I have very little
> > knowledge about HFileV2. I will try to read about HFileV2 soon.
> >
> > HBASE-4676 seems really cool. IMHO, currently the issue is that write and
> > scan(slower by ~2x as compared to NONE if we assume that Trie compresses
> by
> > ~2-3x) are slow and as per the jira if ratio of value/Key is big then
> trie
> > wont have any impact. Is this feature going to be part of any future
> > release of HBase?  Awesome stuff Matt.
> >
> > @Anoop: You meant that i should use the feature in HBase-4676 and pass
> the
> > timestamp as 0L in each put. Right?
> >
> > Thanks all for your valuable time and inputs.
> > -Anil
> >
> >
> > On Thu, May 24, 2012 at 11:22 PM, Matt Corgan <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hi Anil,
> > >
> > > I created HBASE-6093
> > > <https://issues.apache.org/jira/browse/HBASE-6093>with an idea that
> > > could solve this problem.  It could be a simple
> > > implementation for simple workloads, but gets harder to support for
> > tables
> > > with TTL's, maxVersion > 1, Deletes, etc...  Maybe it can only be
> enabled
> > > if the other ColumnFamily settings are compatible.

Thanks & Regards,
Anil Gupta