Sukant Hajra 2012-07-15, 23:05
William Slacum 2012-07-16, 13:34
Adam Fuchs 2012-07-16, 17:37
Billie J Rinaldi 2012-07-16, 18:15
> > 3. Compressed reverse-timestamp using Unicode tricks?
> > ------------------------------------------------------
> > I see code in Accumulo like
> > // We're past the index column family, so return a term that will sort
> > // lexicographically last. The last unicode character should suffice
> > return new Text("\uFFFD");
> > which gets me thinking that i can probably pull off a impressively
> > compressed,
> > but still lexically orderd, reverse timestamp using Unicode trickery
> > to get a
> > gigantic radix. Is there any precedence for this? I'm a little worried
> > about
> > running into corner cases with Unicode encoding. Otherwise, I think it
> > feels
> > like a simple algorithm that may not eat up much CPU in translation
> > and might
> > save disk space at scale.
> > Or is this optimizing into the noise given compression Accumulo
> > already does
> > under the covers?
> I would think the compression would take care of this. If you try it and
> get an improvement, we'd be interested in seeing the results.
I think it is generally a good idea to use encoding techniques whenever
they're quick, effective, and easy. If you know something about your data
then you can usually do better than a general-purpose compression
algorithm. Slide 11 of my table design presentation (
shows a few extra tricks that might help you out. Another possibility is to
use a two's complement representation for a fixed precision number (e.g. a
long or an int), but flip the first bit.
David Medinets 2012-07-16, 22:45
Adam Fuchs 2012-07-17, 15:25
William Slacum 2012-07-16, 00:04